Supports Python3 (2016.01.25) </ font> Added support for MALSS specification changes (2020.02.08) </ font>
I made a tool called MALSS (Machine Learning Support System) to support machine learning in Python (PyPI/[GitHub](https: /) /github.com/canard0328/malss)). I wrote Introduction and Basic, and this time it is an advanced version.
Use the same data as last time. If you call the fit method normally, it will take time to process because modeling is performed. Therefore, set the algorithm_selection_only option to True so that only the algorithm is selected.
python
from malss import MALSS
import pandas as pd
data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Heart.csv',
index_col=0, na_values=[''])
y = data['AHD']
del data['AHD']
cls = MALSS('classification', random_state=0,
lang='jp')
cls.fit(data, y, algorithm_selection_only=True)
As you get used to it, you may want to change the algorithm you are considering.
In MALSS, you can easily use Algorithms available in scikit-learn.
First, get a list of algorithms that you are currently considering.
python
for name, param in cls.get_algorithms():
print(name)
python
Support Vector Machine (RBF Kernel)
Random Forest
Support Vector Machine (Linear Kernel)
Logistic Regression
Decision Tree
k-Nearest Neighbors
I will delete Random Forest. Specify the index of the algorithm you want to delete. If not specified, the last algorithm will be deleted.
python
cls.remove_algorithm(1)
for name, param in cls.get_algorithms():
print(name)
python
Support Vector Machine (RBF Kernel)
Support Vector Machine (Linear Kernel)
Logistic Regression
Decision Tree
k-Nearest Neighbors
Let's add Extremely Randomized Trees. The first argument of * add_algorithm * is the instance of the estimator, and the third is the name of the algorithm (optional). The second is a little difficult to understand, but it is the parameters and their range for grid search. In the form of a dictionary (list), Key is the parameter name and Value is the range. For the parameters, refer to scikit-learn documentation. Also, it does not have to be a scikit-learn algorithm, but it may be an original estimator with * fit * and * predict * methods.
python
from sklearn.ensemble import ExtraTreesClassifier
cls.add_algorithm(ExtraTreesClassifier(n_jobs=3, random_state=0),
[{'n_estimators': [10, 30, 50],
'max_depth': [3, 5, None],
'max_features': [0.3, 0.6, 'auto']}],
'Extremely Randomized Trees')
for name, param in cls.get_algorithms():
print(name)
The rest is the same as last time.
python
cls.fit(X, y, 'classification_result')
cls.make_sample_code('classification_sample_code.py')
Looking at the results of the grid search of the report, if the parameter at the maximum evaluation score is the value at the end of the range, it is necessary to widen (slide) the range. However, since MALSS does not currently have a method to change the parameters of the grid search, Change the parameter change range in the same way as changing the algorithm above.
I introduced my machine learning support tool MALSS three times: introduction, basics, and application. I think there are still many points that have yet to be reached, so I would be grateful if you could give us your opinions.
Recommended Posts