[PYTHON] I tried PyCaret2.0 (pycaret-nightly)
Introduction
How to try
pip install pycaret-nightly
- You can try version 2.0.0 with pip.
try
Preprocessing for imbalanced data
- v2 adds pre-processing for imbalanced (only a few positive and negative) data in binary classification.
- The specification method is simple, specify ** fix_imbalance = True ** in the argument when setting up.
from pycaret.classification import *
exp1 = setup(
data,
target = 'default',
fix_imbalance=True #Add this line
)
- This specification preprocesses the imbalanced data.
Pretreatment performed (SMOTE)
Other pretreatment
- As mentioned in the opening article, it seems that it also supports ** ADASYN ** and ** Random Over Sampler **.
- Internally, in v2 (Nightly build version), in Dependent libraries, imbalanced- learn has been added.
- The docstring also has the following description.
fix_imbalance_method: obj, default = None
When fix_imbalance is set to True and fix_imbalance_method is None, 'smote' is applied
by default to oversample minority class during cross validation. This parameter
accepts any module from 'imblearn' that supports 'fit_resample' method.
How to specify other preprocessing
- I would like to specify the imblearn class as instructed by the docstring above.
- Import and specify the over_sampling algorithm specified from imblearn.over_sampling.
from pycaret.classification import *
from imblearn.over_sampling import ADASYN, BorderlineSMOTE, KMeansSMOTE, RandomOverSampler, SMOTE, SMOTENC, SVMSMOTE
exp1 = setup(
data,
target = 'default',
fix_imbalance=True,
fix_imbalance_method=ADASYN() #Specified on this line
)
Algorithm that could be specified
Ingenuity on display when evaluating a model
- MCC (Matthews Correlation Coefficient) has been added to the accuracy list along with the implementation of preprocessing for imbalanced data.
- If a minority class is taken as a positive example, F-measure is fine, but even in situations where such consideration is not taken, MCC can correctly evaluate the learning accuracy for imbalanced data, which is good.
- For the relationship between F-measure and MCC at the time of imbalanced data, this blog will be helpful, so link it. I will do it.
Finally
- This time, we introduced the correspondence to v2 imbalance data.
- In addition to this, it seems that support for mlflow is planned, and I am looking forward to the official release of v2.
- This is a rough article, but thank you for staying with us until the end.