This article summarizes Cross Validation. If you have a limit on the number of submissions, like Kaggle, it's unwise to check the accuracy by submitting. Also, if you improve the accuracy by submitting, there is a possibility that the accuracy will drop in the Gachi examination due to the problem of generalization performance.
Therefore, a part of the training data is used as verification data. You can specify what percentage of the total to use this part.
Just write a model. Suppose the test data is given as follows.
x_train #Explanatory variable
y_train #Objective variable
I will use SVM easily this time as well.
from sklearn import svm
from sklearn.model_selection import cross_val_score
lng = svm.SVC()
score = cross_val_score(lng, x_train, y_train)
print(score)
The default is 5 splits. cross_val_score (learner, explanatory variable, objective variable)
Now, if you want to change the number of divisions, write as follows.
from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
kfold = KFold(n_splits=3)
lng = svm.SVC()
scores = cross_val_score(lng, iris.data, iris.target, cv=kfold)
scores
You have now divided it into three parts.
Depending on the training data, the data may be arranged neatly. Suppose you have data for 9 answers, which are A, A, A, B, B, B, C, C, C. If you divide these into three, there will be a bias in learning, so you have to divide them well. Therefore, it is stratified k-fold cross-validation. If you use this, it will be divided and cross-validated so that the ratio will be about the same.
from sklearn.model_selection import StratifiedKFold
stratifiedkfold = StratifiedKFold(n_splits=3)
scores = cross_val_score(logres, iris.data, iris.target, cv=stratifiedkfold)
print(scores)
We have summarized the significance of cross-validation, general cross-validation, and stratified k-fold cross-validation. In general, normal cross-validation is used for regression problems, and stratified k-fold cross-validation is used for classification problems.
Recommended Posts