[PYTHON] Evaluation method of machine learning regression problem (mean square error and coefficient of determination)

○ The main points of this article Note that I learned how to evaluate regression problems.

○ Source code (Python): For linear regression model

Learning and evaluating regression problems


from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.svm import SVR
%matplotlib inline

#Data preparation. Boston Home Prices
data = load_boston()
X = data.data[:, [5,]] #Extract only the number of rooms as explanatory variables
y = data.target

#Learning with a simple regression model
model_l = LinearRegression()
model_l.fit(X, y)
#Regression line y=ax +Slope of b and intercept value
print(model_l.coef_) #a: Tilt
print(model_l.intercept_) #b: intercept

#Forecast
l_pred = model_l.predict(X)

#graph display
fig, ax = plt.subplots()
ax.scatter(X, y, color='red', marker='s', label='data')
ax.plot(X, l_pred, color='blue', label='regression curve')
ax.legend()
plt.show()

result [9.10210898] -34.67062077643857 ダウンロード.png

Is it like that?

Mean squared error

Calculated all the squares of the error between the predicted value and the actual value, and take the average. The smaller the value, the more correct the predicted value.

Mean squared error


mean_squared_error(y, l_pred)
43.60055177116956

Coefficient of determination

A number that indicates the accuracy of the prediction using the mean square error. It will be a value from 0 to 1. Even when comparing different values, the values are relatively comparable.

Coefficient of determination


r2_score(y, l_pred)
0.48352545599133423

Is the result of the linear regression model a good result? Next, try other algorithms.

○ Source code (Python): For SVR model

SVR learning and evaluation


# SVR(Support vector machine (kernel method))Learning at
model_s = SVR(C=0.01, kernel='linear') #Regularization parameter is 0.Use linear kernel with 01
model_s.fit(X, y)

#Regression line y=ax +Slope of b and intercept value
print(model_s.coef_) #a: Tilt
print(model_s.intercept_) #b: intercept

#Forecast
s_pred = model_s.predict(X)

#graph display
fig, ax = plt.subplots()
ax.scatter(X, y, color='red', marker='s', label='data')
ax.plot(X, l_pred, color='blue', label='regression curve')
ax.plot(X, s_pred, color='black', label='svr curve')
ax.legend()
plt.show()

result [[1.64398]] [11.13520958] ダウンロード (1).png

Is SVR more subtle? ??

Mean squared error and coefficient of determination


mean_squared_error(y, s_pred)
72.14197118147209
r2_score(y, s_pred)
0.14543531775956597

Linear regression is better. However! !! I don't know if I change the hyperparameters (regularization parameters and kernel method) of SVR.

○ Source code (Python): For SVR model

Learning and evaluation of SVR (changing hyperparameters)


model_s2 = SVR(C=1.0, kernel='rbf') #Uses rbf kernel with regularization parameter 1
model_s2.fit(X, y)

#Forecast
s_pred2 = model_s2.predict(X)

#graph display
fig, ax = plt.subplots()
ax.scatter(X, y, color='red', marker='s', label='data')
ax.plot(X, l_pred, color='blue', label='regression curve')
ax.plot(X, s_pred, color='black', label='svr curve')
ax.plot(X, s_pred2, color='orange', label='svr_rbf curve')
ax.legend()
plt.show()

result ダウンロード (2).png

Is it smooth and nice? ??

Mean squared error and coefficient of determination


mean_squared_error(y, s_pred2)
37.40032481992347
r2_score(y, s_pred2)
0.5569708427424378

Better than linear regression. It depends on hyperparameters.

■ Impressions ・ Study how to evaluate regression problems. Both the mean square error and the coefficient of determination are determined based on the "difference between the predicted value and the actual data". The program compares the "list of correct answer data" and the "list of predicted data", makes a difference in each data, squares them, and averages them. Easy to understand sensuously.

Recommended Posts

Evaluation method of machine learning regression problem (mean square error and coefficient of determination)
The pitfall of RMSE (Root Mean Square Error), the evaluation index for regression!
Significance of machine learning and mini-batch learning
Classification and regression in machine learning
[Machine learning] Summary and execution of model evaluation / indicators (w / Titanic dataset)
Machine learning algorithm (generalization of linear regression)
Basic machine learning procedure: ③ Compare and examine the selection method of features
Predict short-lived works of Weekly Shonen Jump by machine learning (Part 2: Learning and evaluation)
Numerai Tournament-Fusion of Traditional Quants and Machine Learning-
Summary of evaluation functions used in machine learning
Introduction to machine learning ~ Let's show the table of K-nearest neighbor method ~ (+ error handling)
[Linear regression] About the number of explanatory variables and the coefficient of determination (adjusted degrees of freedom)
Machine learning #k-nearest neighbor method and its implementation and various
A concrete method of predicting horse racing by machine learning and simulating the recovery rate
I considered the machine learning method and its implementation language from the tag information of Qiita
Machine learning logistic regression
Machine learning linear regression
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Try to evaluate the performance of machine learning / regression model
Music and Machine Learning Preprocessing MFCC ~ Mel Frequency Cepstrum Coefficient
Examination of Forecasting Method Using Deep Learning and Wavelet Transform-Part 2-
[Machine learning] Understanding logistic regression from both scikit-learn and mathematics