[PYTHON] Linear regression (for beginners) -Code edition-

This time, I will summarize the implementation (code) of linear regression.

■ Linear regression procedure

We will proceed with the following 6 steps.

Preparation of module
Data preparation
Create a model
Calculation of predicted value
Residual plot
Model evaluation

1. Preparation of module

First, import the required modules.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Module to read the dataset
from sklearn.datasets import load_boston

#Module that separates training data and test data
from sklearn.model_selection import train_test_split

#Module that performs linear regression (least squares method)
from sklearn.linear_model import LinearRegression

#Module evaluated by mean squared error (MSE)
from sklearn.metrics import mean_squared_error

#Coefficient of determination (r2_Module to be evaluated by score)
from sklearn.metrics import r2_score

## 2. Data preparation After acquiring the data, divide it for easy processing.

#Loading Boston dataset
boston = load_boston()

#Divide into objective variable and explanatory variable
X, y = boston.data, boston.target

#Divide into training data and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

## 3. Create a model First, create a linear regression execution function (instance) and apply it to the training data. Then, each weight of the model formula (regression formula) can be obtained by the least squares method.


#Create an instance of linear regression
lr = LinearRegression()

#Find each weight by the least squares method
lr.fit(X_train, y_train)

#Output intercept
print(lr.intercept_)

#Output regression coefficient (slope)
print(lr.coef_)

Output result


lr.intercept_: 22.38235838998073

lr.coef_: [-1.30942592e-01  3.93397530e-02  3.34152241e-02  2.69312969e+00
 -1.77337676e+01  3.95093181e+00 -1.16396424e-03 -1.51204896e+00
  3.36066399e-01 -1.37052283e-02 -9.54346277e-01  8.23484120e-03
 -5.17616471e-01]

lr.intercept_: intercept (weight $ w_0 $) lr.coef_: Regression coefficient / slope (weight $ w_1 $ ~ $ w_ {13} $)

Therefore, the concrete numerical values in the following model formula (regression formula) were obtained.

$ y = w_0 + w_1x_1+w_2x_2+ \cdots + w_{12}x_{12} + w_{13}x_{13}$

4. Calculation of predicted value

Put the test data (X_test) in the model formula created earlier and get the predicted value (y_pred).


y_pred = lr.predict(X_test)
y_pred

Output result


y_pred: [34.21868721 14.30330361 18.5687412  19.17046762 22.60218908 31.75197222
 13.56424899 19.9953213  36.91942317 ..... 25.08561495, 13.68910956]

## 5. Residual plot Before evaluating the model, let's look at the residual plot. Residual: Difference between predicted value and correct answer value (y_pred --y_test)

#Creating drawing objects and subplots
fig, ax = plt.subplots()

#Residual plot
ax.scatter(y_pred, y_pred - y_test, marker = 'o')

# y =Plot the red straight line of 0
ax.hlines(y = 0, xmin = -10, xmax = 50, linewidth = 2, color = 'red')
 
#Set the axis label
ax.set_xlabel('y_pred')
ax.set_ylabel('y_pred - y_test')

#Added graph title
ax.set_title('Residual Plot')

plt.show()

Output result
The data is well-balanced above and below the red line (y_pred --y_test = 0). It can be confirmed that there is no big bias in the output of the predicted value.

6. Model evaluation

In linear regression, evaluation is performed using the following two indicators.

・ Mean squared error (MSE) ・ Coefficient of determination (r2_score)


#Mean squared error (MSE)
print(mean_squared_error(y_test, y_pred))

#Coefficient of determination
print(r2_score(y_test, y_pred))

Output result


MSE: 19.858434053618794
r2_score: 0.7579111486552739

From the above, we were able to evaluate using two indicators.

■ Finally

In linear regression, we will create and evaluate a model based on steps 1 to 6 above.

This time, for beginners, I have summarized only the implementation (code). Looking at the timing in the future, I would like to write an article about theory (mathematical formula).

Thank you for reading.