Multiple regression analysis and drawing were performed using the data Boston House Prices dataset (data set for Boston house prices) attached to scikit-learn. (Reference)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from mpl_toolkits.mplot3d import Axes3D
#Load Boston HOuse Prices data
boston = load_boston()
boston_df = pd.DataFrame(boston.data)
boston_df.columns = boston.feature_names
boston['Price'] = boston.target #Add objective variable to data frame
The dataset contains 13 attributes (columns) such as the number of rooms (RM) and lower percentage? (LSTAT) in each house. First, use the corr () method of the DataFrame class to calculate the correlation coefficient between each attribute and the objective variable (Price).
>> boston_df.corr()
CRIM ZN INDUS CHAS NOX RM AGE \
CRIM 1.000000 -0.199458 0.404471 -0.055295 0.417521 -0.219940 0.350784
ZN -0.199458 1.000000 -0.533828 -0.042697 -0.516604 0.311991 -0.569537
INDUS 0.404471 -0.533828 1.000000 0.062938 0.763651 -0.391676 0.644779
CHAS -0.055295 -0.042697 0.062938 1.000000 0.091203 0.091251 0.086518
NOX 0.417521 -0.516604 0.763651 0.091203 1.000000 -0.302188 0.731470
RM -0.219940 0.311991 -0.391676 0.091251 -0.302188 1.000000 -0.240265
AGE 0.350784 -0.569537 0.644779 0.086518 0.731470 -0.240265 1.000000
DIS -0.377904 0.664408 -0.708027 -0.099176 -0.769230 0.205246 -0.747881
RAD 0.622029 -0.311948 0.595129 -0.007368 0.611441 -0.209847 0.456022
TAX 0.579564 -0.314563 0.720760 -0.035587 0.668023 -0.292048 0.506456
PTRATIO 0.288250 -0.391679 0.383248 -0.121515 0.188933 -0.355501 0.261515
B -0.377365 0.175520 -0.356977 0.048788 -0.380051 0.128069 -0.273534
LSTAT 0.452220 -0.412995 0.603800 -0.053929 0.590879 -0.613808 0.602339
Price -0.385832 0.360445 -0.483725 0.175260 -0.427321 0.695360 -0.376955
DIS RAD TAX PTRATIO B LSTAT Price
CRIM -0.377904 0.622029 0.579564 0.288250 -0.377365 0.452220 -0.385832
ZN 0.664408 -0.311948 -0.314563 -0.391679 0.175520 -0.412995 0.360445
INDUS -0.708027 0.595129 0.720760 0.383248 -0.356977 0.603800 -0.483725
CHAS -0.099176 -0.007368 -0.035587 -0.121515 0.048788 -0.053929 0.175260
NOX -0.769230 0.611441 0.668023 0.188933 -0.380051 0.590879 -0.427321
RM 0.205246 -0.209847 -0.292048 -0.355501 0.128069 -0.613808 0.695360
AGE -0.747881 0.456022 0.506456 0.261515 -0.273534 0.602339 -0.376955
DIS 1.000000 -0.494588 -0.534432 -0.232471 0.291512 -0.496996 0.249929
RAD -0.494588 1.000000 0.910228 0.464741 -0.444413 0.488676 -0.381626
TAX -0.534432 0.910228 1.000000 0.460853 -0.441808 0.543993 -0.468536
PTRATIO -0.232471 0.464741 0.460853 1.000000 -0.177383 0.374044 -0.507787
B 0.291512 -0.444413 -0.441808 -0.177383 1.000000 -0.366087 0.333461
LSTAT -0.496996 0.488676 0.543993 0.374044 -0.366087 1.000000 -0.737663
Price 0.249929 -0.381626 -0.468536 -0.507787 0.333461 -0.737663 1.000000
Regression analysis is performed on the Price value using the two attributes RM and LSTAT, which have large absolute values of the correlation coefficient (= large correlation), as explanatory variables.
#Data frame of explanatory variables required (using explanatory variables and RM and LSTAT)
df = pd.DataFrame()
df['RM'] = boston_df['RM']
df['LSTAT'] = boston_df['LSTAT']
X_multi = df
Y_target = boston.target
#Model generation and fitting
lreg = LinearRegression()
lreg.fit(X_multi, Y_target)
a1, a2 = lreg.coef_ #coefficient
b = lreg.intercept_ #Intercept
Use matplotlib's plot_surface () and scatter3D for drawing.
#3D drawing (drawing of measured values)
x, y, z = np.array(df['RM']), np.array(df['LSTAT']), np.array(Y_target)
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter3D(np.ravel(x), np.ravel(y), np.ravel(z), c = 'red')
#3D drawing (drawing of regression plane)
X, Y = np.meshgrid(np.arange(0, 10, 1), np.arange(0, 40, 1))
Z = a1 * X + a2 * Y + b
ax.plot_surface(X, Y, Z, alpha = 0.5) #Specify transparency with alpha
ax.set_xlabel("RM")
ax.set_ylabel("LSTAT")
ax.set_zlabel("Price")
plt.show()