[PYTHON] Random Forest (2)

Performance evaluation of scikit-learn + Random Forest part2

Introduction

Last time, we evaluated the performance of homogeneous data, so this time we trained on variable data.

The data used is the same as last time, mnist

Last time, 2000 samples were extracted for each character,

This time, 1100, 1300, 1500, 1700, 1900, 2100, 2300, 2500, 2700, 2900 samples were extracted in order from 0.

The test data is also 10000 homogeneous data.

Variables to change

--Number of trees --Exploration depth --Number of features

3 types.

Experiment

Number of trees

First, the number of trees is changed to four types: 10, 100, 1000, and 10000.

The result is shown below trees.png

Even if you look at the value with the best accuracy last time was about 0965 ?, the accuracy has decreased slightly, but the tendency is the same.

I think it's enough to have about 1000

Exploration depth

Next, about the search for depth

This is learned by changing from 2 to 20 as before.

The number of trees is 1000, the number of features is sqrt (features)

The result is shown below

depth.png

This is also the same as the last time, the accuracy is almost the same, and overfitting does not occur even if you search deeply.

Number of features

Finally features

Change from 10 to 55

The number of trees is 1000, depth is fixed at max

feature.png

Since the time of sqrt is 28, is it better to use less than that this time?

However, since the order difference is 0.001, it may be said that there is no big difference if it is 20 or more.

Comparative experiment

Finally the result of SVM for comparison

C = 1.0, gamma = 1/784 in RBF kernel

After all, Ramdom Forest is more accurate, but

The accuracy of SVM is higher than last time ...?

It's possible considering that it's sampling randomly,

Considering that the accuracy of Random Forest was reduced by about 0.05,

Perhaps SVM is more resistant to data variation ...?

MNIST is too accurate to be evaluated very much ...

Recommended Posts

Random Forest (2)
Random Forest
Balanced Random Forest in python
I tried using Random Forest
Random forest (implementation / parameter summary)
[Machine learning] Understanding random forest
Decision tree and random forest
Use Random Forest in Python
Machine Learning: Supervised --Random Forest
Random Forest size / processing time comparison
Random forest (classification) and hyperparameter tuning
Regression model comparison-ARMA vs. Random Forest Regression
[Machine learning] Try studying random forest
Multi-label classification by random forest with scikit-learn
Disease classification in Random Forest using Python
How to set up Random forest using Optuna
#Random string generation
How to set up Random forest using Optuna