Random Forest is an ensemble algorithm often used in machine learning. This is an ensemble learning method that improves accuracy by combining the supervised learning model ** decision tree **. As shown in the figure below, it is called a random forest because it has a forest-like structure that combines the results of multiple trees. One of the characteristics of decision trees is that they are ** easy to overfit **. Random forests can reduce the effects of overfitting on that decision tree.
--Can be used for both regression and classification. --The effects of overfitting can be reduced. --The model is unlikely to be affected by slight fluctuations in the input data.
—— Data with too much noise will overfit. --Complicated calculation than decision tree. --The calculation time is long.
import sklearn.ensemble
rf = sklearn.ensemble.RandomForestClassifier()
rf.fit(train_X, train_y)
Parameters- | Overview | option | Default |
---|---|---|---|
criterion | Split criteria | "gini", "entropy" | "gini" |
splitter | Split selection strategy | "best", "random" | "best" |
max_depth | The deepest depth of the tree | int | None |
min_samples_split | Minimum sample size of post-split node(Smaller tends to overfit) | int(The number of samples)/float(Percentage of all samples) | 2 |
min_samples_leaf | leaf(Last node)Minimum sample size required for(Smaller tends to overfit) | int/float | 2 |
max_features | Number of features used for division(The larger it is, the more likely it is to overfit) | int/float, auto, log2 | None |
class_weight | Class weight | "balanced", none | none |
presort | Pre-sorting data(Calculation speed changes depending on the data size) | bool | False |
min_impurity_decrease | Limit impureness and control node elongation | float | 0. |
boostrap | Whether to use a subset of samples when building a decision tree | bool | 1 |
oob_score | Whether to use samples not used in bootstrap for accuracy evaluation | bool | False |
n_jobs | Whether to parallelize the processor with predict and fit(-1)Use all at the time | 0,1,-1 | 0 |
random_state | Seed used when generating random numbers | int | none |
verbose | Verbalization of results | 1/0 | 0 |
Recommended Posts