Overview

This article summarizes Cross Validation. If you have a limit on the number of submissions, like Kaggle, it's unwise to check the accuracy by submitting. Also, if you improve the accuracy by submitting, there is a possibility that the accuracy will drop in the Gachi examination due to the problem of generalization performance.

Therefore, a part of the training data is used as verification data. You can specify what percentage of the total to use this part.

Cross-validation

Just write a model. Suppose the test data is given as follows.

x_train #Explanatory variable
y_train #Objective variable

I will use SVM easily this time as well.

from sklearn import svm
from sklearn.model_selection import cross_val_score

lng = svm.SVC()
score = cross_val_score(lng, x_train, y_train)
print(score)

The default is 5 splits. cross_val_score (learner, explanatory variable, objective variable)

Now, if you want to change the number of divisions, write as follows.

from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

kfold = KFold(n_splits=3)
lng = svm.SVC()
scores = cross_val_score(lng, iris.data, iris.target, cv=kfold)
scores

You have now divided it into three parts.

Stratified k-fold cross-validation

Depending on the training data, the data may be arranged neatly. Suppose you have data for 9 answers, which are A, A, A, B, B, B, C, C, C. If you divide these into three, there will be a bias in learning, so you have to divide them well. Therefore, it is stratified k-fold cross-validation. If you use this, it will be divided and cross-validated so that the ratio will be about the same.

from sklearn.model_selection import StratifiedKFold
stratifiedkfold = StratifiedKFold(n_splits=3)
scores = cross_val_score(logres, iris.data, iris.target, cv=stratifiedkfold)
print(scores)

Summary

We have summarized the significance of cross-validation, general cross-validation, and stratified k-fold cross-validation. In general, normal cross-validation is used for regression problems, and stratified k-fold cross-validation is used for classification problems.

[PYTHON] Cross Validation improves machine learning model accuracy

Overview

Cross-validation

Stratified k-fold cross-validation

Summary