[PYTHON] Learning Deep Forest, a new learning device comparable to DNN

I want to go into the bushes of Deep Forest, not the darkness of Deep Learning.

An algorithm that may be an alternative to DNN is Deep Forest. Read the article Deep Forest: Toward the Alternative of Deep Neural Network, or [Paper](https://arxiv.org/pdf/1702.08835. When I read pdf), it seems that it has a deep structure by using multiple collections of decision trees called random forests and arranging them in many directions in the width and depth directions.

For Random Forest, this article will be helpful. It is an explanation of random forest in a machine learning tool called scikit-learn which is a module of python, but since the code to be run this time is also Python and scikit-learn is used, it is insanely helpful.

There seems to be an implementation in R language other than Python. (Deep Forest implementation code example)

Get the code

Creating a Deep Forest from scratch is hard. I'm getting lost. So, get the Deep Forest implemented in Python from github.

https://github.com/leopiney/deep-forest

From learning to testing, the README works well. The correct answer rate seems to be reasonable and not bad. By the way, since it only supports CPU, the CPU usage rate will be quite high.

Save model

I'm hungry to experience it all, but it's a little inconvenient that I can't save the learned model, so I'll add the following two member functions to the MGCForest class in deep_forest.py.

deep_forest.py


class MGCForest():

    :
    :
    :

    def save_model(self):
        # save multi-grained scanner
        for mgs_instance in self.mgs_instances:
            stride_ratio = mgs_instance.stride_ratio
            folds = mgs_instance.folds
            for i, estimator in enumerate(mgs_instance.estimators):
                joblib.dump(estimator, 'model/mgs_submodel_%.4f_%d_%d.pkl' % (stride_ratio, folds, i + 1)) 
        
        # save cascade forest
        for n_level, one_level_estimators in enumerate(self.c_forest.levels):
            for i, estimator in enumerate(one_level_estimators):
                joblib.dump(estimator, 'model/cforest_submodel_%d_%d.pkl' % (n_level + 1, i + 1))

    def load_model(self):
        # load multi-grained scanner
        for mgs_instance in self.mgs_instances:
            stride_ratio = '%.4f' % mgs_instance.stride_ratio
            folds = mgs_instance.folds
            for i in range(len(mgs_instance.estimators)):
                model_name = 'model/mgs_submodel_%s_%d_%d.pkl' % (stride_ratio, folds, i + 1)
                print('load model: {}'.format(model_name))
                mgs_instance.estimators[i] = joblib.load(model_name)

        # load cascade forest
        model_files = glob.glob('model/cforest_submodel_*.pkl')
        model_files.sort()
        max_level = 0
        model_dict = dict()
        for model_name in model_files:
            model_subname = re.sub('model/cforest_submodel_', '', model_name)
            model_level = int(model_subname.split('_')[0])
            if max_level < model_level:
                max_level = model_level

            if model_level not in model_dict.keys():
                model_dict[model_level] = list()
            print('load model: {}'.format(model_name))
            model_dict[model_level].append(joblib.load(model_name))

        self.c_forest.levels = list()
        for n_level in range(1, max_level + 1):
            self.c_forest.levels.append(model_dict[n_level])

        n_classes_ = self.c_forest.levels[0][0].n_classes_
        self.c_forest.classes = np.unique(np.arange(n_classes_))

If you call the save_model function after training with the fit function, the model parameters will be saved in the model directory (please create the model directory and empty the contents before calling the save_model function). If you want to load the trained model parameters, you can call the load_model function.

Deep Forest is created by multiple random forests, but when saving a model, it is necessary to create and save a parameter file for each random forest. So there are multiple pkl files in the model directory.

Originally, Random Forest has the advantage that it is a model that is not affected by differences in the range of values for each feature. Neural networks do not, so you need to normalize the range of values for each feature from 0 to 1. Therefore, I hope that this Deep Forest will greatly inspire its power when you want to combine not only images but also various other features.

Recommended Posts

Learning Deep Forest, a new learning device comparable to DNN
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Introduction to Deep Learning ~ Backpropagation ~
Try to build a deep learning / neural network with scratch
I tried to divide with a deep learning language model
Introduction to Deep Learning ~ Function Approximation ~
About Deep Learning (DNN) Project Management
Deep learning to start without GPU
Introduction to Deep Learning ~ Coding Preparation ~
Introduction to Deep Learning ~ Dropout Edition ~
Introduction to Deep Learning ~ Forward Propagation ~
Introduction to Deep Learning ~ CNN Experiment ~
A story of a deep learning beginner trying to classify guitars on CNN
PPLM: A simple deep learning technique to generate sentences with specified attributes
Reinforcement learning to learn from zero to deep
Introduction to Deep Learning ~ Convolution and Pooling ~
How to study deep learning G test
Image alignment: from SIFT to deep learning
From nothing on Ubuntu 18.04 to setting up a Deep Learning environment in Tensor
Build a python environment to learn the theory and implementation of deep learning
I tried to extract a line art from an image with Deep Learning
Deep Learning
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
A memorandum of studying and implementing deep learning
Try to create a new command on linux
Introduction to Deep Learning ~ Localization and Loss Function ~
How to Implement a new CPUFreq Processor Driver
I made a tool to get new articles
I installed Chainer, a framework for deep learning