[PYTHON] Cross Validation improves machine learning model accuracy

Overview

This article summarizes Cross Validation. If you have a limit on the number of submissions, like Kaggle, it's unwise to check the accuracy by submitting. Also, if you improve the accuracy by submitting, there is a possibility that the accuracy will drop in the Gachi examination due to the problem of generalization performance.

Therefore, a part of the training data is used as verification data. You can specify what percentage of the total to use this part.

Cross-validation

Just write a model. Suppose the test data is given as follows.

x_train #Explanatory variable
y_train #Objective variable

I will use SVM easily this time as well.

from sklearn import svm
from sklearn.model_selection import cross_val_score

lng = svm.SVC()
score = cross_val_score(lng, x_train, y_train)
print(score)

The default is 5 splits. cross_val_score (learner, explanatory variable, objective variable)

Now, if you want to change the number of divisions, write as follows.

from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

kfold = KFold(n_splits=3)
lng = svm.SVC()
scores = cross_val_score(lng, iris.data, iris.target, cv=kfold)
scores

You have now divided it into three parts.

Stratified k-fold cross-validation

Depending on the training data, the data may be arranged neatly. Suppose you have data for 9 answers, which are A, A, A, B, B, B, C, C, C. If you divide these into three, there will be a bias in learning, so you have to divide them well. Therefore, it is stratified k-fold cross-validation. If you use this, it will be divided and cross-validated so that the ratio will be about the same.

from sklearn.model_selection import StratifiedKFold
stratifiedkfold = StratifiedKFold(n_splits=3)
scores = cross_val_score(logres, iris.data, iris.target, cv=stratifiedkfold)
print(scores)

Summary

We have summarized the significance of cross-validation, general cross-validation, and stratified k-fold cross-validation. In general, normal cross-validation is used for regression problems, and stratified k-fold cross-validation is used for classification problems.

Recommended Posts

Cross Validation improves machine learning model accuracy
Machine learning model considering maintainability
Inversely analyze a machine learning model
Machine learning
<Course> Machine Learning Chapter 3: Logistic Regression Model
<Course> Machine Learning Chapter 1: Linear Regression Model
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
Gaussian mixed model EM algorithm [statistical machine learning]
Face image dataset sorting using machine learning model (# 3)
Classify machine learning related information by topic model
[Memo] Machine learning
Machine learning classification
Machine Learning sample
Machine Learning with Caffe -1-Category images using reference model
Attempt to include machine learning model in python package
[Machine learning] Text classification using Transformer model (Attention-based classifier)
xgboost: A valid machine learning model for table data
Machine learning tutorial summary
Cross Validation with scikit-learn
About machine learning overfitting
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Deep learning / cross entropy
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Somehow learn machine learning
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
Try to evaluate the performance of machine learning / regression model
Try to evaluate the performance of machine learning / classification model
Machine Learning Super Introduction Probability Model and Maximum Likelihood Estimate
Create a python machine learning model relearning mechanism with mlflow