Explanation of the concept of regression analysis using python Part 2

Search for α and β values

Grab an image with a graph

In [Part 1] of this article (http://qiita.com/kenmatsu4/items/8b4e908d7c93d046110d), see that you can find the minimum value if you fix each of $ \ alpha $ and $ \ beta $ with some value. However, in order to actually find the parameters $ \ alpha, \ beta $ of the approximate straight line (regression line) for this data, do not look for the case where $ \ alpha $ and $ \ beta $ have the minimum values at the same time. Must not be.

The function $ S $ handled in [Part 1](http://qiita.com/kenmatsu4/items/8b4e908d7c93d046110d#Let's calculate) is organized as a two-variable function of $ \ alpha $ and $ \ beta $ as follows. It looks like.


S(\alpha, \beta) = 
\left( \sum_i^n x_i^2 \right) \alpha^2 + n\beta^2
+ 2 \left( \sum_i^n x_i \right)\alpha \beta
- 2 \left( \sum_i^n x_i y_i \right)\alpha 
- 2 \left( \sum_i^n y_i \right)\beta
+ \sum_i^n y_i^2

In addition, each coefficient can be obtained from the data (calculated in [Part 1](http://qiita.com/kenmatsu4/items/8b4e908d7c93d046110d#Let's calculate)).

n=50
\left( \sum_i^n x_i^2 \right) =34288
\left( \sum_i^n y_i^2 \right)=11604
\left( \sum_i^n x_iy_i \right)=18884
\left( \sum_i^n x_i \right)=1240
\left( \sum_i^n y_i \right)=655

And if you substitute this


S(\alpha, \beta) = 
34288 \alpha^2 + 50\beta^2
+ 2480\alpha \beta
- 37768\alpha 
- 1310 \beta
+ 11604

It will be. Let's draw a graph of this two-variable quadratic curve. The vertical axis $ S $ is the sum of squares of the distance (error) between the points of each data and the straight line. Find the place where this is the smallest.


from mpl_toolkits.mplot3d.axes3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import cm

# set field
X = np.linspace(0.2, 1.3, 100)
Y = np.linspace(-25, 15, 100)

# set data
#sum(x**2)
sum_x_2 = 34288.2988
#sum(y**2)
sum_y_2 = 11603.8684051
#dot(x,y)
sum_xy = 18884.194896
#sum(x)
sum_x = 1239.7
#sum(y)
sum_y = 655.0152
X, Y = np.meshgrid(X, Y)
#S(α,β)=34288α^2 + 50β^2 + 2480αβ − 37768α − 1310β + 11604
S = (sum_x_2 * (X**2)) + (50 * (Y**2)) + (2 * sum_x * X * Y) + (-2 * sum_xy  * X) + (-2 * sum_y * Y) + sum_y_2

# prepare plot
fig = plt.figure(figsize=(18,6))
ax = fig.add_subplot(121, projection='3d', azim=60)
ax.set_xlabel("alpha")
ax.set_ylabel("beta")
ax.set_zlabel("S")

# draw 3D graph
surf = ax.plot_surface(X, Y, S, rstride=1, cstride=1, cmap=cm.coolwarm,
        linewidth=0, antialiased=False)

# draw contour
ax = fig.add_subplot(122)
plt.contour(X,Y,S,50)
ax.set_xlabel("alpha")
ax.set_ylabel("beta")

plt.show()

4dc53003-39ef-ef17-db45-7db46f1efdd9.png

The 3D graph on the left is a bit confusing, but looking at the contour graph on the right, the minimum value in the center of the ellipse is somehow close to the previously visually predicted $ \ alpha = 0.74, \ beta = -5 $. I will do it!

Try to solve by calculation

To solve it by calculation, partial derivative of $ S $ with $ \ alpha $ and $ \ beta $, respectively, to obtain a value that becomes 0. If you write $ S $ as a function of $ \ alpha $ and $ \ beta $, respectively, it will be as follows.


S(\alpha) = \left( \sum_i^n x_i^2 \right) \alpha^2
 + 2\left( \sum_i^n (x_i\beta - x_i y_i ) \right) \alpha 
 + n\beta^2 - 2\beta\sum_i^n y_i + \sum_i^n y_i^2

S(\beta) = n\beta^2
+ 2 \left( \sum_i^n (x_i\alpha - y_i) \right) \beta
+ \alpha^2\sum_i^n x_i^2 - 2\alpha \sum_i^n x_iy_i + \sum_i^n y_i^2

Since this is partially differentiated and set as 0,


\frac{\partial S}{\partial \alpha} = 0,

\frac{\partial S}{\partial \beta } = 0,

Will solve the simultaneous equations of. In other words


\frac{\partial S}{\partial \alpha} =  2\left(\sum_i^n x_i^2 \right) \alpha +  2\left( \sum_i^n x_i \right) \beta - 2\sum_i^n x_i y_i = 0

\frac{\partial S}{\partial \beta} = 2n\beta + 2\left( \sum_i^n x_i\right) \alpha - 2\sum_i^ny_i = 0

Will be solved. Substituting the value obtained from the data

\frac{1}{2}\frac{\partial S}{\partial \alpha} =  34288 \alpha +  1240 \beta - 18884 = 0
\frac{1}{2}\frac{\partial S}{\partial \beta} = 50 \beta + 1240 \alpha - 655 = 0

If you solve the simultaneous equations using Python,

from sympy import *
a, b = symbols('a b')
init_printing()

# 34288α + 1240β − 18884 = 0
#    50β + 1240α −   655 = 0
solve([34288 * a + 1240 * b - 18884, 50* b + 1240 * a - 655], [a, b])

a --> 0.746606334842
b --> -5.41583710407

\alpha = 0.746606334842
\beta  = -5.41583710407

You can get the solution like

Substituting the values of $ \ alpha and \ beta $ again and plotting a straight line on the scatter plot, it will be as follows.

import numpy as np
import matplotlib.pyplot as plt

data= np.loadtxt('cars.csv',delimiter=',',skiprows=1)
data[:,1] = map(lambda x: x * 1.61, data[:,1])    #km from mph/Convert to h
data[:,2] = map(lambda y: y * 0.3048, data[:,2])  #Convert from ft to m

fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(111)
ax.set_xlim(0,50)
ax.set_title("Stopping Distances of Cars with estimated regression line")
ax.set_xlabel("speed(km/h)")
ax.set_ylabel("distance(m)")
plt.scatter(data[:,1],data[:,2])

x = np.linspace(0,50,50)
y = 0.746606334842 * x -5.41583710407
plt.plot(x,y)

estimated.png

This is the regression line: smile:

Continued to explain with animation.

Recommended Posts

Explanation of the concept of regression analysis using python Part 2
Explanation of the concept of regression analysis using Python Part 1
Explanation of the concept of regression analysis using Python Extra 1
Cut a part of the string using a Python slice
Calculate the regression coefficient of simple regression analysis with python
Review the concept and terminology of regression
Time variation analysis of black holes using python
[Python] Read the source code of Bottle Part 2
Shortening the analysis time of Openpose using sound
Try using the Python web framework Tornado Part 1
[Python] Read the source code of Bottle Part 1
Try using the collections module (ChainMap) of python3
Find the geometric mean of n! Using Python
Try using the Python web framework Tornado Part 2
[Python + OpenCV] Whiten the transparent part of the image
Data analysis using Python 0
the zen of Python
Basics of regression analysis
Regression analysis in Python
Predicting the future of Numazu's population transition by time-series regression analysis using Prophet
[Python] [Word] [python-docx] Simple analysis of diff data using python
Python --Explanation and usage summary of the top 24 packages
A python implementation of the Bayesian linear regression class
The pain of gRPC using Python. November 2019. (Personal memo)
Study from the beginning of Python Hour8: Using packages
Towards the retirement of Python2
About the ease of Python
Static analysis of Python programs
python: Basics of using scikit-learn ①
Simple regression analysis in Python
About the features of Python
Data analysis using python pandas
Basics of Python × GIS (Part 1)
The Power of Pandas: Python
3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]
Think about the analysis environment (Part 1: Overview) * As of January 2017
View using the python module of Nifty Cloud mobile backend
[Python] LASSO regression with equation constraints using the multiplier method
[Python] I tried collecting data using the API of wikipedia
Basics of Python x GIS (Part 3)
The story of Python and the story of NaN
Image capture of firefox using python
First simple regression analysis in Python
Wrap (part of) the AtCoder Library in Cython for use in Python
[Python] The stumbling block of import
First Python 3 ~ The beginning of repetition ~
[Python] I wrote the route of the typhoon on the map using folium
Python: Application of supervised learning (regression)
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
Feature extraction by TF method using the result of morphological analysis
Removal of haze using Python detailEnhanceFilter
Existence from the viewpoint of Python
[Python] LINE notification of the latest information using Twitter automatic search
pyenv-change the python version of virtualenv
[Python] I thoroughly explained the theory and implementation of logistic regression
Change the Python version of Homebrew
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 6: Using the model
[Python] Understanding the potential_field_planning of Python Robotics
Evaluate the performance of a simple regression model using LeaveOneOut cross-validation
Review of the basics of Python (FizzBuzz)
Basics of Python x GIS (Part 2)