[PYTHON] Regression model and its visualization using scikit-learn

Yesterday One of the books introduced SciPy and NumPy Optimizing & Boosting your Python Programming is an example of regression by scikit-learn.

Regression plane calculation from 3D model

First, import the library for writing 3D models.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# scikit-Use learn's Linear Regression
from sklearn import linear_model
#Use a sample dataset for regression
from sklearn.datasets.samples_generator import make_regression

Training data generation from sample data

There is sample data, so let's classify it based on the training data.

#Generate synthetic data for training and testing
X, y = make_regression(n_samples=100, n_features=2, n_informative=1,
                       random_state=0, noise=50)
# =>
# [[ 1.05445173 -1.07075262]
#  [-0.36274117 -0.63432209]
# ...
#  [-0.17992484  1.17877957]
#  [-0.68481009  0.40234164]]
# [  -6.93224214   -4.12640648   29.47265153  -12.03166314 -121.67258636
#  -149.24989393  113.53496654   -7.83638906  190.00097568   49.48805247
# ...
#   246.92583786  171.84739934  -33.55917696   38.71008939  -28.23999523
#    39.5677481  -168.02196071 -201.18826919   69.07078178  -36.96534574]

Separation of training data and test data

Divide the generated data at a ratio of 80:20 for training and exams.

X_train, X_test = X[:80], X[-20:]
y_train, y_test = y[:80], y[-20:]

Classifier training

We will train when we are ready. First create an instance of the classifier, then train the classifier with the familiar .fit method.

regr = linear_model.LinearRegression()
#train
regr.fit(X_train, y_train)

#Display estimates
print(regr.coef_)
#=> [-10.25691752 90.5463984 ]

Value prediction

Then predict the y value based on the training data.

X1 = np.array([1.2, 4])
print(regr.predict(X1))
#=> 350.860363861

Evaluation

Let's evaluate the result.

print(regr.score(X_test, y_test))
#=> 0.949827492261

Visualization

The data alone is not intuitive, so let's visualize it at the end.

fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot(111, projection='3d')
# ax = Axes3D(fig)

# Data
ax.scatter(X_train[:, 0], X_train[:, 1], y_train, facecolor='#00CC00')
ax.scatter(X_test[:, 0], X_test[:, 1], y_test, facecolor='#FF7800')

coef = regr.coef_
line = lambda x1, x2: coef[0] * x1 + coef[1] * x2

grid_x1, grid_x2 = np.mgrid[-2:2:10j, -2:2:10j]
ax.plot_surface(grid_x1, grid_x2, line(grid_x1, grid_x2),
                alpha=0.1, color='k')
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
ax.zaxis.set_visible(False)

fig.savefig('image.png', bbox='tight')

image.png

A plane was sought. It seems that they are almost the same.

Summary

Using the sample dataset, I was able to draw a beautiful plane. In real problems, it may not go very well, but it is helpful to suppress the theory.

Recommended Posts

Regression model and its visualization using scikit-learn
Clustering and visualization using Python and CytoScape
[scikit-learn, matplotlib] Multiple regression analysis and 3D drawing
Multivariable regression model with scikit-learn --SVR comparison verification
Analysis of financial data by pandas and its visualization (2)
Analysis of financial data by pandas and its visualization (1)
Two-dimensional visualization of document vectors using Word2Vec trained model
[Translation] scikit-learn 0.18 Tutorial Statistical learning tutorial for scientific data processing Model selection: Estimator and its parameter selection
Regression using Gaussian process
Model Complexity and Robustness
Regression with linear model
Using Python with SPSS Modeler extension nodes ① Setup and visualization
Using MLflow with Databricks ② --Visualization of experimental parameters and metrics -
[Machine learning] Understanding logistic regression from both scikit-learn and mathematics
Scientific / technical calculation by Python] Drawing and visualization of 3D isosurface and its cross section using mayavi