[PYTHON] Linear regression (for beginners) -Code edition-

This time, I will summarize the implementation (code) of linear regression.

■ Linear regression procedure

We will proceed with the following 6 steps.

  1. Preparation of module
  2. Data preparation
  3. Create a model
  4. Calculation of predicted value
  5. Residual plot
  6. Model evaluation

1. Preparation of module

First, import the required modules.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Module to read the dataset
from sklearn.datasets import load_boston

#Module that separates training data and test data
from sklearn.model_selection import train_test_split

#Module that performs linear regression (least squares method)
from sklearn.linear_model import LinearRegression

#Module evaluated by mean squared error (MSE)
from sklearn.metrics import mean_squared_error

#Coefficient of determination (r2_Module to be evaluated by score)
from sklearn.metrics import r2_score

## 2. Data preparation After acquiring the data, divide it for easy processing.
#Loading Boston dataset
boston = load_boston()

#Divide into objective variable and explanatory variable
X, y = boston.data, boston.target

#Divide into training data and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

## 3. Create a model First, create a linear regression execution function (instance) and apply it to the training data. Then, each weight of the model formula (regression formula) can be obtained by the least squares method.

#Create an instance of linear regression
lr = LinearRegression()

#Find each weight by the least squares method
lr.fit(X_train, y_train)

#Output intercept
print(lr.intercept_)

#Output regression coefficient (slope)
print(lr.coef_)


Output result


lr.intercept_: 22.38235838998073

lr.coef_: [-1.30942592e-01  3.93397530e-02  3.34152241e-02  2.69312969e+00
 -1.77337676e+01  3.95093181e+00 -1.16396424e-03 -1.51204896e+00
  3.36066399e-01 -1.37052283e-02 -9.54346277e-01  8.23484120e-03
 -5.17616471e-01]

lr.intercept_: intercept (weight $ w_0 $) lr.coef_: Regression coefficient / slope (weight $ w_1 $ ~ $ w_ {13} $)

Therefore, the concrete numerical values in the following model formula (regression formula) were obtained.

$ y = w_0 + w_1x_1+w_2x_2+ \cdots + w_{12}x_{12} + w_{13}x_{13}$

4. Calculation of predicted value

Put the test data (X_test) in the model formula created earlier and get the predicted value (y_pred).


y_pred = lr.predict(X_test)
y_pred


Output result


y_pred: [34.21868721 14.30330361 18.5687412  19.17046762 22.60218908 31.75197222
 13.56424899 19.9953213  36.91942317 ..... 25.08561495, 13.68910956]

## 5. Residual plot Before evaluating the model, let's look at the residual plot. Residual: Difference between predicted value and correct answer value (y_pred --y_test)
#Creating drawing objects and subplots
fig, ax = plt.subplots()

#Residual plot
ax.scatter(y_pred, y_pred - y_test, marker = 'o')

# y =Plot the red straight line of 0
ax.hlines(y = 0, xmin = -10, xmax = 50, linewidth = 2, color = 'red')
 
#Set the axis label
ax.set_xlabel('y_pred')
ax.set_ylabel('y_pred - y_test')

#Added graph title
ax.set_title('Residual Plot')

plt.show()


Output result
image.png The data is well-balanced above and below the red line (y_pred --y_test = 0). It can be confirmed that there is no big bias in the output of the predicted value.

6. Model evaluation

In linear regression, evaluation is performed using the following two indicators.

・ Mean squared error (MSE) ・ Coefficient of determination (r2_score)


#Mean squared error (MSE)
print(mean_squared_error(y_test, y_pred))

#Coefficient of determination
print(r2_score(y_test, y_pred))


Output result


MSE: 19.858434053618794
r2_score: 0.7579111486552739

From the above, we were able to evaluate using two indicators.

■ Finally

In linear regression, we will create and evaluate a model based on steps 1 to 6 above.

This time, for beginners, I have summarized only the implementation (code). Looking at the timing in the future, I would like to write an article about theory (mathematical formula).

Thank you for reading.

Recommended Posts

Linear regression (for beginners) -Code edition-
Ridge Regression (for beginners) -Code Edition-
Decision tree (for beginners) -Code edition-
Support Vector Machine (for beginners) -Code Edition-
Machine learning beginners try linear regression
Linear regression
[Kaggle for super beginners] Titanic (Logistic regression)
Roadmap for beginners
Spacemacs settings (for beginners)
Linear regression with statsmodels
Techniques for code testing?
python textbook for beginners
Machine learning linear regression
Regression with linear model
OpenCV for Python beginners
For those who are analyzing in atmosphere (Linear Regression Model 1)
Machine Learning: Supervised --Linear Regression
[For beginners] kaggle exercise (merucari)
Linux distribution recommended for beginners
Python code memo for yourself
CNN (1) for image classification (for beginners)
Python3 environment construction (for beginners)
Overview of Docker (for beginners)
Linear regression method using Numpy
Seaborn basics for beginners ④ pairplot
Basic Python grammar for beginners
Supervised learning (regression) 2 Advanced edition
100 Pandas knocks for Python beginners
[Python] Sample code for Python grammar
Python for super beginners Python #functions 1
Online linear regression in Python
Python #list for super beginners
~ Tips for beginners to Python ③ ~
[For Kaggle beginners] Titanic (LightGBM)
Reference resource summary (for beginners)
Linux command memorandum [for beginners]
Convenient Linux shortcuts (for beginners)
Robust linear regression with scikit-learn
[For beginners] How to implement O'reilly sample code in Google Colab