[PYTHON] [Machine learning] Try studying random forest

What is Random Forest?

Random Forest is an ensemble algorithm often used in machine learning. This is an ensemble learning method that improves accuracy by combining the supervised learning model ** decision tree **. As shown in the figure below, it is called a random forest because it has a forest-like structure that combines the results of multiple trees. One of the characteristics of decision trees is that they are ** easy to overfit **. Random forests can reduce the effects of overfitting on that decision tree.

Screenshot 2019-11-25 at 14.23.23.png

Random forest algorithm

  1. Randomly select k features from a sample dataset with m features.
  2. Make a decision tree using k features.
  3. Repeat steps 1 and 2 n times while changing the combination of features (or randomly changing the subset used ** boosttrap ample **) to make n decision trees.
  4. In the classification problem, the mode of the results of all decision trees is output, and in the regression problem, the average value of the results of all decision trees is output as the final result.

Random Forest Pros and Cons

Pros

--Can be used for both regression and classification. --The effects of overfitting can be reduced. --The model is unlikely to be affected by slight fluctuations in the input data.

Disadvantages

—— Data with too much noise will overfit. --Complicated calculation than decision tree. --The calculation time is long.

scikit-learn random forest

import sklearn.ensemble
rf = sklearn.ensemble.RandomForestClassifier()
rf.fit(train_X, train_y)

Random Forest parameters

Parameters- Overview option Default
criterion Split criteria "gini", "entropy" "gini"
splitter Split selection strategy "best", "random" "best"
max_depth The deepest depth of the tree int None
min_samples_split Minimum sample size of post-split node(Smaller tends to overfit) int(The number of samples)/float(Percentage of all samples) 2
min_samples_leaf leaf(Last node)Minimum sample size required for(Smaller tends to overfit) int/float 2
max_features Number of features used for division(The larger it is, the more likely it is to overfit) int/float, auto, log2 None
class_weight Class weight "balanced", none none
presort Pre-sorting data(Calculation speed changes depending on the data size) bool False
min_impurity_decrease Limit impureness and control node elongation float 0.
boostrap Whether to use a subset of samples when building a decision tree bool 1
oob_score Whether to use samples not used in bootstrap for accuracy evaluation bool False
n_jobs Whether to parallelize the processor with predict and fit(-1)Use all at the time 0,1,-1 0
random_state Seed used when generating random numbers int none
verbose Verbalization of results 1/0 0

Recommended Posts

[Machine learning] Try studying random forest
[Machine learning] Understanding random forest
Machine Learning: Supervised --Random Forest
[Machine learning] Try studying decision trees
Try machine learning with Kaggle
Random Forest (2)
[Machine learning] Let's summarize random forest in an easy-to-understand manner
Random Forest
Machine learning beginners try linear regression
Machine learning
Try machine learning with scikit-learn SVM
Random seed research in machine learning
[Memo] Machine learning
Machine learning classification
Machine Learning sample
Try to forecast power demand by machine learning
Try using Jupyter Notebook of Azure Machine Learning
Machine learning tutorial summary
About machine learning overfitting
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Try to predict forex (FX) with non-deep machine learning
Machine learning support vector machine
Studying Machine Learning-Pandas Edition-
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Machine learning beginners try to make a decision tree
Somehow learn machine learning
[Machine learning] Try to detect objects using Selective Search
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
[Machine learning] Start Spark with iPython Notebook and try MLlib
Try to evaluate the performance of machine learning / regression model
Try to evaluate the performance of machine learning / classification model
Machine learning beginners try to reach out to Naive Bayes (2) --Implementation
Try to predict if tweets will burn with machine learning
Machine learning beginners try to reach out to Naive Bayes (1) --Theory
Machine learning model considering maintainability
Machine learning learned with Pokemon
Data set for machine learning
Try deep learning with TensorFlow
Machine learning in Delemas (practice)
An introduction to machine learning
Machine learning / classification related techniques
Machine Learning: Supervised --Linear Regression
Basics of Machine Learning (Notes)
Try disabling IPv6 at random
Machine learning beginners tried RBM
I tried using Random Forest
Random forest (implementation / parameter summary)
Machine learning with Python! Preparation
Decision tree and random forest
Reinforcement learning 5 Try programming CartPole?
Machine Learning Study Resource Notepad
Machine learning ② Naive Bayes Summary
Use Random Forest in Python