Scikit-learn decision Generate Python code from tree / random forest rules

Introduction

Machine learning in Python is scikit-learn, but to use scikit-learn, you need to install a large library such as numpy or scipy. However, if you want to classify a program using a machine learning model, you may not want to put in such a heavy item. I wish I could use what I learned with scikit-learn without numpy, scipy, and scikit-learn for classification processing.

Therefore, this time, I will introduce the one that generates if-then format code from the decision tree (DecisionTreeClassifier) and Random forest (RandomForestClassifier) of scikit-learn.

code

https://github.com/ikegami-yukino/misc/blob/master/machinelearning/dt2code.py

Create the code of the function that returns the class number when the feature data is input. In this code, the number of samples is added for each branch point, so you can see what is important. By the way, I wrote this with reference to Code that was on stackoverflow. It was.

A code like this is generated from the decision tree that learned the iris data.

def f(sepal_length=0, sepal_width=0, petal_length=0, petal_width=0):
    """
    0 -> setosa
    1 -> versicolor
    2 -> virginica
    """
    if petal_width <= 0.800000011921:  # samples=150
        return 0  # samples=50
    else:
        if petal_width <= 1.75:  # samples=100
            if petal_length <= 4.94999980927:  # samples=54
                if petal_width <= 1.65000009537:  # samples=48
                    return 1  # samples=47
                else:
                    return 2  # samples=1
            else:
                if petal_width <= 1.54999995232:  # samples=6
                    return 2  # samples=3
                else:
                    if petal_length <= 5.44999980927:  # samples=3
                        return 1  # samples=2
                    else:
                        return 2  # samples=1
        else:
            if petal_length <= 4.85000038147:  # samples=46
                if sepal_length <= 5.94999980927:  # samples=3
                    return 1  # samples=1
                else:
                    return 2  # samples=2
            else:
                return 2  # samples=43

Reference notebook

-In the case of decision tree -For Random Forest

In the case of Random Forest, it takes a little time to generate code from multiple trees and write majority voting.

in conclusion

I introduced the one that generates Python code for classification from the decision tree of scikit-learn and the rules of random forest. This makes it possible to perform classification processing directly without loading the model from a library such as scikit-learn. This eliminates the need to install numpy, scipy and scikit-learn. Also, since it is ordinary Python code, I think it is an advantage that it is easy to change when you want to play with the rules a little.

However, it is not limited to code generation, but if there are many features, outputting the contents of the decision tree tends to be very complicated, so be careful about how to use it.