Tool MALSS (application) that supports machine learning in Python

Supports Python3 (2016.01.25) </ font> Added support for MALSS specification changes (2020.02.08) </ font>

I made a tool called MALSS (Machine Learning Support System) to support machine learning in Python (PyPI/[GitHub](https: /) /github.com/canard0328/malss)). I wrote Introduction and Basic, and this time it is an advanced version.

Preparation

Use the same data as last time. If you call the fit method normally, it will take time to process because modeling is performed. Therefore, set the algorithm_selection_only option to True so that only the algorithm is selected.

python


from malss import MALSS
import pandas as pd
data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Heart.csv',
                   index_col=0, na_values=[''])
y = data['AHD']
del data['AHD']
cls = MALSS('classification', random_state=0,
            lang='jp')
cls.fit(data, y, algorithm_selection_only=True)

Algorithm change

As you get used to it, you may want to change the algorithm you are considering.

In MALSS, you can easily use Algorithms available in scikit-learn.

Get algorithm list

First, get a list of algorithms that you are currently considering.

python


for name, param in cls.get_algorithms():
    print(name)

python


Support Vector Machine (RBF Kernel)
Random Forest
Support Vector Machine (Linear Kernel)
Logistic Regression
Decision Tree
k-Nearest Neighbors

Algorithm removal

I will delete Random Forest. Specify the index of the algorithm you want to delete. If not specified, the last algorithm will be deleted.

python


cls.remove_algorithm(1)
for name, param in cls.get_algorithms():
    print(name)

python


Support Vector Machine (RBF Kernel)
Support Vector Machine (Linear Kernel)
Logistic Regression
Decision Tree
k-Nearest Neighbors

Addition of algorithm

Let's add Extremely Randomized Trees. The first argument of * add_algorithm * is the instance of the estimator, and the third is the name of the algorithm (optional). The second is a little difficult to understand, but it is the parameters and their range for grid search. In the form of a dictionary (list), Key is the parameter name and Value is the range. For the parameters, refer to scikit-learn documentation. Also, it does not have to be a scikit-learn algorithm, but it may be an original estimator with * fit * and * predict * methods.

  • Don't forget to import the algorithm

python


from sklearn.ensemble import ExtraTreesClassifier
cls.add_algorithm(ExtraTreesClassifier(n_jobs=3, random_state=0),
                  [{'n_estimators': [10, 30, 50],
                    'max_depth': [3, 5, None],
                    'max_features': [0.3, 0.6, 'auto']}],
                  'Extremely Randomized Trees')
for name, param in cls.get_algorithms():
    print(name)

Analysis execution

The rest is the same as last time.

python


cls.fit(X, y, 'classification_result')
cls.make_sample_code('classification_sample_code.py')

Change grid search parameters

Looking at the results of the grid search of the report, if the parameter at the maximum evaluation score is the value at the end of the range, it is necessary to widen (slide) the range. However, since MALSS does not currently have a method to change the parameters of the grid search, Change the parameter change range in the same way as changing the algorithm above.

in conclusion

I introduced my machine learning support tool MALSS three times: introduction, basics, and application. I think there are still many points that have yet to be reached, so I would be grateful if you could give us your opinions.

Recommended Posts