Machine learning in Python is scikit-learn, but to use scikit-learn, you need to install a large library such as numpy or scipy. However, if you want to classify a program using a machine learning model, you may not want to put in such a heavy item. I wish I could use what I learned with scikit-learn without numpy, scipy, and scikit-learn for classification processing.
Therefore, this time, I will introduce the one that generates if-then format code from the decision tree (DecisionTreeClassifier) and Random forest (RandomForestClassifier) of scikit-learn.
Create the code of the function that returns the class number when the feature data is input. In this code, the number of samples is added for each branch point, so you can see what is important. By the way, I wrote this with reference to Code that was on stackoverflow. It was.
A code like this is generated from the decision tree that learned the iris data.
def f(sepal_length=0, sepal_width=0, petal_length=0, petal_width=0):
"""
0 -> setosa
1 -> versicolor
2 -> virginica
"""
if petal_width <= 0.800000011921: # samples=150
return 0 # samples=50
else:
if petal_width <= 1.75: # samples=100
if petal_length <= 4.94999980927: # samples=54
if petal_width <= 1.65000009537: # samples=48
return 1 # samples=47
else:
return 2 # samples=1
else:
if petal_width <= 1.54999995232: # samples=6
return 2 # samples=3
else:
if petal_length <= 5.44999980927: # samples=3
return 1 # samples=2
else:
return 2 # samples=1
else:
if petal_length <= 4.85000038147: # samples=46
if sepal_length <= 5.94999980927: # samples=3
return 1 # samples=1
else:
return 2 # samples=2
else:
return 2 # samples=43
-In the case of decision tree -For Random Forest
In the case of Random Forest, it takes a little time to generate code from multiple trees and write majority voting.
I introduced the one that generates Python code for classification from the decision tree of scikit-learn and the rules of random forest. This makes it possible to perform classification processing directly without loading the model from a library such as scikit-learn. This eliminates the need to install numpy, scipy and scikit-learn. Also, since it is ordinary Python code, I think it is an advantage that it is easy to change when you want to play with the rules a little.
However, it is not limited to code generation, but if there are many features, outputting the contents of the decision tree tends to be very complicated, so be careful about how to use it.
Recommended Posts