Scikit-learn decision Generate Python code from tree / random forest rules

Introduction

Machine learning in Python is scikit-learn, but to use scikit-learn, you need to install a large library such as numpy or scipy. However, if you want to classify a program using a machine learning model, you may not want to put in such a heavy item. I wish I could use what I learned with scikit-learn without numpy, scipy, and scikit-learn for classification processing.

Therefore, this time, I will introduce the one that generates if-then format code from the decision tree (DecisionTreeClassifier) and Random forest (RandomForestClassifier) of scikit-learn.

code

Create the code of the function that returns the class number when the feature data is input. In this code, the number of samples is added for each branch point, so you can see what is important. By the way, I wrote this with reference to Code that was on stackoverflow. It was.

A code like this is generated from the decision tree that learned the iris data.

def f(sepal_length=0, sepal_width=0, petal_length=0, petal_width=0):
    """
    0 -> setosa
    1 -> versicolor
    2 -> virginica
    """
    if petal_width <= 0.800000011921:  # samples=150
        return 0  # samples=50
    else:
        if petal_width <= 1.75:  # samples=100
            if petal_length <= 4.94999980927:  # samples=54
                if petal_width <= 1.65000009537:  # samples=48
                    return 1  # samples=47
                else:
                    return 2  # samples=1
            else:
                if petal_width <= 1.54999995232:  # samples=6
                    return 2  # samples=3
                else:
                    if petal_length <= 5.44999980927:  # samples=3
                        return 1  # samples=2
                    else:
                        return 2  # samples=1
        else:
            if petal_length <= 4.85000038147:  # samples=46
                if sepal_length <= 5.94999980927:  # samples=3
                    return 1  # samples=1
                else:
                    return 2  # samples=2
            else:
                return 2  # samples=43

Reference notebook

-In the case of decision tree -For Random Forest

In the case of Random Forest, it takes a little time to generate code from multiple trees and write majority voting.

in conclusion

I introduced the one that generates Python code for classification from the decision tree of scikit-learn and the rules of random forest. This makes it possible to perform classification processing directly without loading the model from a library such as scikit-learn. This eliminates the need to install numpy, scipy and scikit-learn. Also, since it is ordinary Python code, I think it is an advantage that it is easy to change when you want to play with the rules a little.

However, it is not limited to code generation, but if there are many features, outputting the contents of the decision tree tends to be very complicated, so be careful about how to use it.

Recommended Posts

Scikit-learn decision Generate Python code from tree / random forest rules
Decision tree and random forest
Create a decision tree from 0 with Python (1. Overview)
2. Multivariate analysis spelled out in Python 7-1. Decision tree (scikit-learn)
Balanced Random Forest in python
Use Random Forest in Python
[Python] Decision Tree Personal Tutorial
Generate QR code in Python
[Python] Generate QR code in memory
Creating a decision tree with scikit-learn
Execute Python code from C # GUI
2. Make a decision tree from 0 with Python and understand it (2. Python program basics)
Make a decision tree from 0 with Python and understand it (4. Data structure)
Create a decision tree from 0 with Python and understand it (5. Information Entropy)
Multi-label classification by random forest with scikit-learn
Generate a class from a string in Python
Generate C language from S-expressions in Python
Disease classification in Random Forest using Python
[Note] Execute Python code from Excel (xlwings)