[PYTHON] Evaluate the performance of a simple regression model using LeaveOneOut cross-validation

Introduction

When we analyzed using a small amount of data, we used LeaveOneOut cross-validation as a model evaluation method, so we will share it.

LeaveOneOut cross-validation trains and tests n samples of data, using one sample as test data and the other as train data. Then, while exchanging the test data, this is repeated n times to evaluate the performance of the model. Speaking of k-validation, k is the same value as n samples of data volume. It seems to be used when the amount of data is small.

Below, we will evaluate simple regression using the LOO method.

code

There is a certain DataFrame, and the explanatory variable used for simple regression is specified by loo_column. Suppose the DataFrame's mokuteki contains an objective variable. It trains n times while exchanging data, and finally calculates and returns RootMeanSquaredError. Statsmodels is used for simple regression.

loo.py


from sklearn.model_selection import LeaveOneOut
from statsmodels import api as sm

loo_column = "setsumei"

def loo_rmse(df,loo_column):
    loo_X = df[loo_column]
    
    #Create a simple regression constant term.
    loo_X = sm.add_constant(loo_X)
    loo_y = df_analytics["recognition"]

    loo = LeaveOneOut()
    loo.get_n_splits(loo_X)

    # square_List to save errors
    se_list = list()
    
    #Repeat the data while exchanging the indexes of the data used for train and test
    for train_index, test_index in loo.split(loo_X):
        X_train, X_test = loo_X.iloc[train_index], loo_X.iloc[test_index]
        y_train, y_test = loo_y.iloc[train_index], loo_y.iloc[test_index]
        
        #Simple regression learning
        model = sm.OLS(y_train,X_train)
        result = model.fit()
        
        #Prediction for test data based on learning results. Get the error.
        pred = result.params["const"] + result.params[loo_column] * X_test[loo_column].values[0]
        diff = pred - y_test.values[0]
        
        #Square the error and save
        se_list.append(diff**2)

    #Average the squared error, take the route and return
    ar = np.array(se_list)
    print("RMSE:",np.sqrt(ar.mean()))

reference

http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html

Thank you very much.

Recommended Posts

Evaluate the performance of a simple regression model using LeaveOneOut cross-validation
Try to evaluate the performance of machine learning / regression model
[Translation] scikit-learn 0.18 User Guide 3.1. Cross-validation: Evaluate the performance of the estimator
Try to evaluate the performance of machine learning / classification model
[Pyro] Statistical modeling using the stochastic programming language Pyro ② ~ Simple regression model ~
Evaluate the accuracy of the learning model by cross-validation from scikit learn
Learn the flow of Bayesian estimation and how to use Pystan through a simple regression model
[Python] Implementation of clustering using a mixed Gaussian model
Explanation of the concept of regression analysis using python Part 2
A python implementation of the Bayesian linear regression class
Calculate the regression coefficient of simple regression analysis with python
Explanation of the concept of regression analysis using Python Part 1
Explanation of the concept of regression analysis using Python Extra 1
Severe Acute Respiratory Syndrome: Understanding the Role of Social Distance Strategy with a Simple Model
Implementation of VGG16 using Keras created without using a trained model
Avoiding the pitfalls of using a Mac (for Linux users?)
A simple Python implementation of the k-nearest neighbor method (k-NN)
Reuse the behavior of the @property method by using a descriptor [16/100]
I tried refactoring the CNN model of TensorFlow using TF-Slim
The story of a Django model field disappearing from a class
I made a function to check the model of DCGAN
I made a VGG16 model using TensorFlow (on the way)
Try to model a multimodal distribution using the EM algorithm
Analyze the topic model of becoming a novelist with GensimPy3
The story of creating a database using the Google Analytics API
A simple sample of pivot_table.
A memorandum of using eigen3
Try using Elasticsearch as the foundation of a question answering system
Finding the optimum value of a function using a genetic algorithm (Part 1)
[Python] A simple function to find the center coordinates of a circle
[Kaggle] I made a collection of questions using the Titanic tutorial
Learn Zundokokiyoshi using a simple RNN
Implementation of a simple particle filter
Creating a simple table using prettytable
Python-Simulation of the Epidemic Model (Kermack-McKendrick Model)
Creating a learning model using MNIST
The story of writing a program
Memorandum of introduction of EXODUS, a data model of the finite element method (FEM)
Try to infer using a linear regression model on android [PyTorch Mobile]
I tried using the trained model VGG16 of the deep learning library Keras
What Java users thought of using the Go language for a day
Gently explain the process of making a simple serverless surveillance camera using Raspberry Pi, Gmail API and Line API