○ The main points of this article Note that I learned how to evaluate regression problems.
Learning and evaluating regression problems
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.svm import SVR
%matplotlib inline
#Data preparation. Boston Home Prices
data = load_boston()
X = data.data[:, [5,]] #Extract only the number of rooms as explanatory variables
y = data.target
#Learning with a simple regression model
model_l = LinearRegression()
model_l.fit(X, y)
#Regression line y=ax +Slope of b and intercept value
print(model_l.coef_) #a: Tilt
print(model_l.intercept_) #b: intercept
#Forecast
l_pred = model_l.predict(X)
#graph display
fig, ax = plt.subplots()
ax.scatter(X, y, color='red', marker='s', label='data')
ax.plot(X, l_pred, color='blue', label='regression curve')
ax.legend()
plt.show()
result [9.10210898] -34.67062077643857
Is it like that?
Calculated all the squares of the error between the predicted value and the actual value, and take the average. The smaller the value, the more correct the predicted value.
Mean squared error
mean_squared_error(y, l_pred)
43.60055177116956
A number that indicates the accuracy of the prediction using the mean square error. It will be a value from 0 to 1. Even when comparing different values, the values are relatively comparable.
Coefficient of determination
r2_score(y, l_pred)
0.48352545599133423
Is the result of the linear regression model a good result? Next, try other algorithms.
SVR learning and evaluation
# SVR(Support vector machine (kernel method))Learning at
model_s = SVR(C=0.01, kernel='linear') #Regularization parameter is 0.Use linear kernel with 01
model_s.fit(X, y)
#Regression line y=ax +Slope of b and intercept value
print(model_s.coef_) #a: Tilt
print(model_s.intercept_) #b: intercept
#Forecast
s_pred = model_s.predict(X)
#graph display
fig, ax = plt.subplots()
ax.scatter(X, y, color='red', marker='s', label='data')
ax.plot(X, l_pred, color='blue', label='regression curve')
ax.plot(X, s_pred, color='black', label='svr curve')
ax.legend()
plt.show()
result [[1.64398]] [11.13520958]
Is SVR more subtle? ??
Mean squared error and coefficient of determination
mean_squared_error(y, s_pred)
72.14197118147209
r2_score(y, s_pred)
0.14543531775956597
Linear regression is better. However! !! I don't know if I change the hyperparameters (regularization parameters and kernel method) of SVR.
Learning and evaluation of SVR (changing hyperparameters)
model_s2 = SVR(C=1.0, kernel='rbf') #Uses rbf kernel with regularization parameter 1
model_s2.fit(X, y)
#Forecast
s_pred2 = model_s2.predict(X)
#graph display
fig, ax = plt.subplots()
ax.scatter(X, y, color='red', marker='s', label='data')
ax.plot(X, l_pred, color='blue', label='regression curve')
ax.plot(X, s_pred, color='black', label='svr curve')
ax.plot(X, s_pred2, color='orange', label='svr_rbf curve')
ax.legend()
plt.show()
result
Is it smooth and nice? ??
Mean squared error and coefficient of determination
mean_squared_error(y, s_pred2)
37.40032481992347
r2_score(y, s_pred2)
0.5569708427424378
Better than linear regression. It depends on hyperparameters.
■ Impressions ・ Study how to evaluate regression problems. Both the mean square error and the coefficient of determination are determined based on the "difference between the predicted value and the actual data". The program compares the "list of correct answer data" and the "list of predicted data", makes a difference in each data, squares them, and averages them. Easy to understand sensuously.
Recommended Posts