This time, I will summarize the implementation (code) of linear regression.
We will proceed with the following 6 steps.
First, import the required modules.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#Module to read the dataset
from sklearn.datasets import load_boston
#Module that separates training data and test data
from sklearn.model_selection import train_test_split
#Module that performs linear regression (least squares method)
from sklearn.linear_model import LinearRegression
#Module evaluated by mean squared error (MSE)
from sklearn.metrics import mean_squared_error
#Coefficient of determination (r2_Module to be evaluated by score)
from sklearn.metrics import r2_score
#Loading Boston dataset
boston = load_boston()
#Divide into objective variable and explanatory variable
X, y = boston.data, boston.target
#Divide into training data and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
#Create an instance of linear regression
lr = LinearRegression()
#Find each weight by the least squares method
lr.fit(X_train, y_train)
#Output intercept
print(lr.intercept_)
#Output regression coefficient (slope)
print(lr.coef_)
Output result
lr.intercept_: 22.38235838998073
lr.coef_: [-1.30942592e-01 3.93397530e-02 3.34152241e-02 2.69312969e+00
-1.77337676e+01 3.95093181e+00 -1.16396424e-03 -1.51204896e+00
3.36066399e-01 -1.37052283e-02 -9.54346277e-01 8.23484120e-03
-5.17616471e-01]
lr.intercept_: intercept (weight $ w_0 $) lr.coef_: Regression coefficient / slope (weight $ w_1 $ ~ $ w_ {13} $)
Therefore, the concrete numerical values in the following model formula (regression formula) were obtained.
$ y = w_0 + w_1x_1+w_2x_2+ \cdots + w_{12}x_{12} + w_{13}x_{13}$
Put the test data (X_test) in the model formula created earlier and get the predicted value (y_pred).
y_pred = lr.predict(X_test)
y_pred
Output result
y_pred: [34.21868721 14.30330361 18.5687412 19.17046762 22.60218908 31.75197222
13.56424899 19.9953213 36.91942317 ..... 25.08561495, 13.68910956]
#Creating drawing objects and subplots
fig, ax = plt.subplots()
#Residual plot
ax.scatter(y_pred, y_pred - y_test, marker = 'o')
# y =Plot the red straight line of 0
ax.hlines(y = 0, xmin = -10, xmax = 50, linewidth = 2, color = 'red')
#Set the axis label
ax.set_xlabel('y_pred')
ax.set_ylabel('y_pred - y_test')
#Added graph title
ax.set_title('Residual Plot')
plt.show()
Output result
The data is well-balanced above and below the red line (y_pred --y_test = 0).
It can be confirmed that there is no big bias in the output of the predicted value.
In linear regression, evaluation is performed using the following two indicators.
・ Mean squared error (MSE) ・ Coefficient of determination (r2_score)
#Mean squared error (MSE)
print(mean_squared_error(y_test, y_pred))
#Coefficient of determination
print(r2_score(y_test, y_pred))
Output result
MSE: 19.858434053618794
r2_score: 0.7579111486552739
From the above, we were able to evaluate using two indicators.
In linear regression, we will create and evaluate a model based on steps 1 to 6 above.
This time, for beginners, I have summarized only the implementation (code). Looking at the timing in the future, I would like to write an article about theory (mathematical formula).
Thank you for reading.
Recommended Posts