Introduction

PyCaret2.0 has been released as a Nightly build version and I tried it.
For PyCaret itself, please check the following that I posted earlier.
I tried to visualize the model with the low-code machine learning library "PyCaret"
In v2, it seems that preprocessing for imbalanced data is added as shown below, so I would like to try it.

How to try

Please pip.

pip install pycaret-nightly

You can try version 2.0.0 with pip.

try

Preprocessing for imbalanced data

v2 adds pre-processing for imbalanced (only a few positive and negative) data in binary classification.
The specification method is simple, specify ** fix_imbalance = True ** in the argument when setting up.

from pycaret.classification import *
exp1 = setup(
    data, 
    target = 'default',
    fix_imbalance=True #Add this line
)

This specification preprocesses the imbalanced data.

Pretreatment performed (SMOTE)

As shown in the image above, SMOTE (** S ** ynthetic ** M ** inority ** O ** ver-sampling ** TE ** chnique) seems to be the default.
Regarding SMOTE, Qiita also has a commentary article, so I will link to it.
Data expansion: Try increasing a small number of data with SMOTE
Explanation: Oversampling method explanation (SMOTE, ADASYN, Borderline-SMOTE, Safe-level SMOTE)

Other pretreatment

As mentioned in the opening article, it seems that it also supports ** ADASYN ** and ** Random Over Sampler **.
Internally, in v2 (Nightly build version), in Dependent libraries, imbalanced- learn has been added.
The docstring also has the following description.

fix_imbalance_method: obj, default = None
When fix_imbalance is set to True and fix_imbalance_method is None, 'smote' is applied 
by default to oversample minority class during cross validation. This parameter
accepts any module from 'imblearn' that supports 'fit_resample' method.

How to specify other preprocessing

I would like to specify the imblearn class as instructed by the docstring above.
Import and specify the over_sampling algorithm specified from imblearn.over_sampling.

from pycaret.classification import *
from imblearn.over_sampling import ADASYN, BorderlineSMOTE, KMeansSMOTE, RandomOverSampler, SMOTE, SMOTENC, SVMSMOTE
exp1 = setup(
    data, 
    target = 'default',
    fix_imbalance=True,
    fix_imbalance_method=ADASYN() #Specified on this line
)

Algorithm that could be specified

It is as follows.
imblearn.over_sampling.ADASYN
imblearn.over_sampling.BorderlineSMOTE
imblearn.over_sampling.KMeansSMOTE
imblearn.over_sampling.RandomOverSampler
imblearn.over_sampling.SMOTE * Default
imblearn.over_sampling.SVMSMOTE

Ingenuity on display when evaluating a model

MCC (Matthews Correlation Coefficient) has been added to the accuracy list along with the implementation of preprocessing for imbalanced data.
If a minority class is taken as a positive example, F-measure is fine, but even in situations where such consideration is not taken, MCC can correctly evaluate the learning accuracy for imbalanced data, which is good.
For the relationship between F-measure and MCC at the time of imbalanced data, this blog will be helpful, so link it. I will do it.

Finally

This time, we introduced the correspondence to v2 imbalance data.
In addition to this, it seems that support for mlflow is planned, and I am looking forward to the official release of v2.
This is a rough article, but thank you for staying with us until the end.

[PYTHON] I tried PyCaret2.0 (pycaret-nightly)

Introduction

How to try

try

Preprocessing for imbalanced data

Pretreatment performed (SMOTE)

Other pretreatment

How to specify other preprocessing

Algorithm that could be specified

Ingenuity on display when evaluating a model

Finally