About y explained by a single variable x using Python's matrix operation library numpy I would like to do a least squares fitting.
First, use numpy to stochastically draw a cubic graph.
Drawing a graph
#Module import
import numpy as np
import matplotlib.pyplot as plt
#Explanatory variable(1D)
x = np.arange(-3,7,0.5)
#Response variable(It was a three-dimensional function of the explanatory variables and was randomly generated based on the normal distribution.
y = 10*np.random.rand()+x * np.random.rand() +
2*np.random.rand()*x**2 +x**3
#drawing
plt.scatter(x,y)
plt.show()
In the least squares method, the L2 norm of the data and the predicted value is Interpret it as an error and find the coefficient of the regression line to minimize it. If the data you want to predict is y and the regression line is $ p $, Error will be
Error = \sum_{i=1}^{N}(y_i - p_i)
It will be. Minimizing this error is the goal of least squares regression.
Also, $ p_i $ is expressed by the nth-order equation as follows.
Linear expression\\
p_i = a_1 x + a_0 \\
Quadratic expression\\
p_i = a_2 x^2 + a_1 x + a_0 \\
Third-order formula\\
p_i = a_3 x^3 + a_2 x^2 + a_1 x + a_0 \\
n-th order equation\\
p_i = a_n x^n + ... a_2 x^2 + a_1 x + a_0\\
This time, I would like to find the coefficient $ A = (a_0, a_1, .. a_N) $ of the equation after fitting using the polyfit function of numpy. Find the coefficients of the regression equation with the polyfit function. Then apply the coefficients to the n-th order equation Find the regression equation, but if it gets complicated It is also convenient to use the polyfit1d function.
Fitting and drawing of the nth order obtained by it
#Linear expression
coef_1 = np.polyfit(x,y,1) #coefficient
y_pred_1 = coef_1[0]*x+ coef_1[1] #Fitting function
#Quadratic expression
coef_2 = np.polyfit(x,y,2)
y_pred_2 = coef_2[0]*x**2+ coef_2[1]*x + coef_2[2]
#Third-order formula
coef_3 = np.polyfit(x,y,3)
y_pred_3 = np.poly1d(coef_3)(x) #np.poly1d,Obtained coefficient coef_3 is automatically applied to the formula.
#drawing
plt.scatter(x,y,label="raw_data") #original data
plt.plot(x,y_pred_1,label="d=1") #Linear expression
plt.plot(x,y_pred_2,label="d=2") #Quadratic expression
plt.plot(x,y_pred_3,label = "d=3") #Third-order formula
plt.legend(loc="upper left")
plt.title("least square fitting")
plt.show()
This time, the fitting by the cubic formula seems to be good. The higher the order, the smaller the error tends to be, Overfitting that depends only on the resulting dataset Let's be careful.
Recommended Posts