Note that I understand the least squares algorithm. And I wrote it in Python.

Overview

Describes the least squares method. First, calculate with concrete numerical values, then find the general formula, and finally implement it in Python.

What is the least squares method?

Regression analysis is a method of deriving the formula that best fits the sample data and predicting the numerical value of the new data, which is one of the typical methods. Derive a relational expression that minimizes the error between the sample data and the value obtained from the expression.

Concrete example

Suppose you predict the growth of radish. Currently, on the 4th day, we measure how many centimeters of leaves are coming out every day.

Day 1 Day 2 Day 3 Day 4
0.3 1.9 2.8 4.3

I made a graph of this data. The $ x $ axis is the number of days and the $ y $ axis is the leaf length.

radish1.png

Let the hypothetical function be a linear function. The equation of the linear function is

y = ax + b 

So, calculate what the values of $ a $ and $ b $ should be to make a nice function. As an image, find a linear function that fits the data as closely as possible, as shown in the red graph.

radish2.png

First, from the hypothesis function $ y = ax + b $, the value of $ y $ on the 0th to 4th days (the number of days is $ x $) is calculated, and the error from the actual growth data is calculated. (Day 0 is the day when the seeds are buried)

Days(x) Day 0 Day 1 Day 2 Day 3 Day 4
Record 0.0cm 0.3cm 1.9cm 2.8cm 4.3cm
Calculated from the formula(y) b a+b 2a+b 3a+b 4a+b
error(y-Record) b-0 (a+b)-0.3 (2a+b)-1.9 (3a+b)-2.8 (4a+b)-4.3

The error is the length of the pink part of the graph below.

radish2_2.png

Square and then add up so that the error is not negative. Assuming that the total error is $ J $, the calculation is as follows.

J = b^2+((a+b)-0.3)^2+((2a+b)-1.9)^2+((3a+b)-2.8)^2+((4a+b)-4.3)^2\\
\\
  =30a^2+20ab-59.4a+5b^2-18.6b+30.03

Find $ a $ and $ b $ that minimize this $ J $ and you're done.

Partial differentiation of $ J $ as a function of $ a $ is 0, and partial differentiation of $ J $ as a function of $ b $ is 0. If two simultaneous equations are solved, $ J $ is the minimum. The following $ a $ and $ b $ are calculated.

See here for the minimum value of a two-variable two-dimensional function. http://mathtrain.jp/quadratic

The above equation is partially differentiated with $ a $ ($ \ frac {∂J} {∂a} $), and the lower equation is partially differentiated with $ b $ ($ \ frac {∂J} {∂b). } $).

\left\{
\begin{array}{ll}
0 = 60a+20b-59.4 \\
0 = 20a+10b-18.6 
\end{array}
\right.

When you solve the simultaneous equations,

\left\{
\begin{array}{ll}
a = 1.11 \\
b=-0.36
\end{array}
\right.

Next door

y = 1.11x - 0.36 

I was able to derive the function. After drawing with a graph

radish3_ans.png

It feels very good !!

radish_plot.py


import matplotlib.pyplot as plt
import numpy as np
# data set
plt.plot([0,0.3,1.9,2.8,4.3], "bo") 

plt.title("radish growth")
plt.xlabel("X:days")
plt.ylabel("Y:length")
plt.xlim([0,4])
plt.xticks(np.arange(1,5,1))
plt.ylim([0,4])
plt.yticks(np.arange(0,6,1))

# Graph
x = np.arange(0,10,0.01)
y = 1.11*x -0.36
plt.plot(x,y,"r-")    
plt.show()              

General formula

In the concrete example, the hypothetical function is a linear function, but a quadratic function or a cubic function may be more suitable. Maybe it's usually a more complex relation.

If you fit the example of radish to a quadratic function,

radish4.png

And this may be considered a better formula. Also, the growth of radish may be affected not only by the number of days, but also by the hours of sunshine and the amount of water.

Therefore, let's consider a concrete example by converting it into a general formula.

If there are $ x_ {1} $, $ x_ {2} $ factors that determine the length of the leaves, such as the number of days, the hours of sunshine, the amount of water to be given, and $ \ cdots $$ n $, respectively. , $ x_ {3} $ $ \ cdots $ $ x_ {n} $. Let the coefficient of $ x_ {n} $ be $ \ theta_ {n} $.

The function and hypothesis function that you want to finally derive can be expressed by the following formula. ($ X_ {1} $ doesn't mean $ x $ squared, it's a function that depends on $ x $. It might be helpful to look at the last Python code.)

h_{\theta}(x) = \theta_{0}x_{0} + \theta_{1}x_{1} + \theta_{2}x_{2} +\cdots + \theta_{n}x_{n}

$ \ Theta_ {0} $ is the $ b $ part of the concrete example and is considered as $ x_ {0} = 1 $. Suddenly, $ h_ {\ theta} (x) $ came out, but $ y $ in the concrete example is the length of the leaf.

By associating the concrete example function $ y = 1.11x --0.36 $ with the above hypothetical function, it becomes as follows.

y = -0.36 × 1 + 1.11 × x
h_{\theta}(x) = \theta_{0} × x_{0} + \theta_{1} × x_{1}

Suppose there are multiple sample data for this hypothesis function, and the data is as follows.

x_{0}
(1 fixed)
x_{1}
(Days)
x_{2}
(Daylight hours h)
x_{3}
(Moisture amount ml)
\cdots y
(Leaf length cm)
1 1 14 150 \cdots 0.3
1 2 14 100 \cdots 1.9
1 3 14.1 160 \cdots 2.8
1 4 14.1 160 \cdots 4.3

The data in the $ j $ column of the $ i $ line can be represented as $ x_ {j} ^ {(i)} $, $ y ^ {(i)} $, and when mapped, it looks like this. It is a shape. It looks like an exponent, but it only shows the data.

x_{0}
(1 fixed)
x_{1}
(Days)
x_{2}
(Daylight hours h)
x_{3}
(Moisture amount ml)
\cdots y
(Leaf length cm)
x_{0}^{(1)} x_{1}^{(1)} x_{2}^{(1)} x_{3}^{(1)} \cdots y^{(1)}
x_{0}^{(2)} x_{1}^{(2)} x_{2}^{(2)} x_{3}^{(2)} \cdots y^{(2)}
x_{0}^{(3)} x_{1}^{(3)} x_{2}^{(3)} x_{3}^{(3)} \cdots y^{(3)}
x_{0}^{(4)} x_{1}^{(4)} x_{2}^{(4)} x_{3}^{(4)} \cdots y^{(4)}

When the data in the first column is assigned to $ h_ {\ theta} (x) $, it is expressed as $ h_ {\ theta} (x ^ {(1)}) $.

In this way, the total error with the hypothetical function can be rewritten as the following function. $ m $ is the number of datasets. In the example of the above table, it is 4.

J(\theta_{0},\theta_{1}, \cdots,\theta_{n})= \sum_{i=1}^{m} (h_{\theta}(x^{(i)})-y^{(i)})^2 

The total error $ J (\ theta_ {0}, \ theta_ {1}, \ cdots, \ theta_ {n}) $ is the smallest $ \ theta_ {0}, \ theta_ {1}, \ cdots, \ theta_ The goal is to derive a set of {n} $.

From here, let's replace it with a matrix. A matrix is a powerful item that can calculate data at once.

First, in order to represent the hypothesis function $ h_ {\ theta} (x) $ as a matrix, $ \ theta_ {0} $, $ \ theta_ {1} $, $ \ cdots $ $ \ theta_ {n} $, $ X_ {0} $, $ x_ {1} $, $ \ cdots $ $ x_ {n} $ is represented by a matrix.

\theta=\begin{bmatrix}
\theta_{0} \\
\theta_{1} \\
\vdots\\
\theta_{n} 
\end{bmatrix}
,
  x=\begin{bmatrix}
x_{0} \\
x_{1} \\
\vdots\\
x_{n} 
\end{bmatrix}

Placed this way, $ h_ {\ theta} (x) $ can be represented by the product of the transposed matrix of $ \ theta $ and the matrix $ x $.

h_{\theta}(x)= \theta^T x\\

See here for the transposition and stacking of matrices. http://www.sist.ac.jp/~kanakubo/research/hosoku/trans_gyoretu.html

The data set $ x ^ {(i)} $ in the $ i $ column is represented by a matrix, and the matrix $ X $ given all the datasets is represented by the transposed matrix of each column. $ y ^ {(i)} $ is also represented by a matrix.

x^{(i)}=\begin{bmatrix}
x_{0}^{(i)} \\
x_{1}^{(i)} \\
\vdots\\
x_{n}^{(i)}
\end{bmatrix}
,  
X=\begin{bmatrix}
(x^{(1)})^T \\
(x^{(2)})^T  \\
\vdots\\
(x^{(m)})^T 
\end{bmatrix}
=
\begin{bmatrix}
x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\
x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\
\vdots & \vdots & & \vdots\\
x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \\
\end{bmatrix}
,  
y=\begin{bmatrix}
y^{(1)} \\
y^{(2)} \\
\vdots\\
y^{(m)}
\end{bmatrix}

It looks confusing, but I just converted the above table into a matrix.

And the story blows away at once.

When placed in this way, $ \ theta $, which minimizes the total error $ J (\ theta) $, can be obtained by solving the following determinant.

\theta =
(X^tX)^{-1}X^ty

See here for why this formula is used. http://mathtrain.jp/leastsquarematrix

Also, $ X ^ {-1} $ is called the inverse matrix of $ X $, refer to here. http://mathtrain.jp/inversematrix

Put the dataset in this formula to find $ \ theta $, and put it in $ \ theta_ {0}, \ theta_ {1}, \ cdots, \ theta_ {n} $ of $ h (\ theta) $ to minimize it. The function expression (model function) by the square method is completed.

I wrote it in Python

I wrote this least squares method in Python. You'll be asked for $ \ theta $ in just one line! !!

First is the dataset. Random numbers are added to the function $ y = 3 + 2 \ cos (x) + \ frac {1} {2} x $ to generate an error.

linalg_lstsq.py


x = arange(-3, 10, 0.1)
y = 3 + 2 * np.cos(x) + (1/2) * x + np.random.normal(0.0, 1.0, len(x)) 

In the general formula, $ x_ {1} $ is $ \ cos (x) $ and $ x_ {2} $ is $ x $. If the coefficients to be calculated can be calculated so that $ \ theta_ {0} $ is $ 3 $, $ \ theta_ {1} $ is $ 2 $, and $ \ theta_ {2} $ is $ \ frac {1} {2} $. It's OK.

Create a matrix $ X $ with all the datasets.

linalg_lstsq.py


n = 3
X = np.zeros((len(x), n), float)
X[:,0] = 1
X[:,1] = np.cos(x)
X[:,2] = x

The expression for $ \ theta $ is this one line using linalg.lstsq in the Numpy library.

linalg_lstsq.py


(theta, residuals, rank, s) = linalg.lstsq(X, y)

https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html

As a result, it became like this.

linalg_lstsq.py


import numpy as np
import matplotlib.pyplot as plt

# data-set
x = np.arange(-3, 10, 0.1)
y = 3 + 2 * np.cos(x) + (1/2) * x + np.random.normal(0.0, 1.0, len(x)) 

n = 3
X = np.zeros((len(x), n), float)
X[:,0] = 1
X[:,1] = np.cos(x)
X[:,2] = x

# Least square method
(theta, residuals, rank, s) = np.linalg.lstsq(X, y)

# data-set plot
plt.plot(x, y, 'b.')
# h(theta) plot
plt.plot(x, theta[0] + theta[1] * X[:,1] + theta[2] * X[:,2], 'r-')

plt.title("linalg.lstsq")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")

plt.show()

print('θ[0]: %s' % theta[0])
print('θ[1]: %s' % theta[1])
print('θ[2]: %s' % theta[2])

lstsqrs3.png lstsqrs_2-3.png

I was able to get a fairly close value, and even if you look at the graph, it fits! !!

end

Since I have just started studying machine learning, I am learning the standard algorithms and ideas. I can't write anything brand new, but I hope it helps someone.

30d067ead81320ed8a9112ee25b96762_s.jpg

Recommended Posts

Note that I understand the least squares algorithm. And I wrote it in Python.
Note that I understand the algorithm of the machine learning naive Bayes classifier. And I wrote it in Python.
I tried the least squares method in Python
I wrote the queue in Python
I wrote the stack in Python
I wrote it in Go to understand the SOLID principle
I set the environment variable with Docker and displayed it in Python
A memo that I wrote a quicksort in Python
I wrote a class in Python3 and Java
[Fundamental Information Technology Engineer Examination] I wrote the algorithm of Euclidean algorithm in Python.
Carefully understand the exponential distribution and draw in Python
Plot and understand the multivariate normal distribution in Python
Carefully understand the Poisson distribution and draw in Python
I wrote python in Japanese
I want to replace the variables in the python template file and mass-produce it in another file.
I understand Python in Japanese!
A note that runs an external program in Python and parses the resulting line
Note that it supports Python 3
[Python] I installed the game from pip and played it
I tried programming the chi-square test in Python and Java.
I wrote a script that splits the image in two
I wrote python3.4 in .envrc with direnv and allowed it, but I got a syntax error
Sorting algorithm and implementation in Python
I wrote Fizz Buzz in Python
Movement that changes direction in the coordinate system I tried Python 3
I wrote the code to write the code of Brainf * ck in python
[python] A note that started to understand the behavior of matplotlib.pyplot
Note that Python decodes base64 format image and saves it locally
I bought and analyzed the year-end jumbo lottery with Python that can be executed in Colaboratory
I tried to find out the difference between A + = B and A = A + B in Python, so make a note
I wrote a tri-tree that can be used for high-speed dictionary implementation in D language and Python.
[Fundamental Information Technology Engineer Examination] I wrote an algorithm for the maximum value of an array in Python.
[Python] Sweet Is it sweet? About suites and expressions in the official documentation
The one that divides the csv file, reads it, and processes it in parallel
I made a Discord bot in Python that translates when it reacts
I wrote the selection sort in C
The file name was bad in Python and I was addicted to import
[Python beginner] I collected the articles I wrote
Find it in the procession and edit it
I wrote the sliding wing in creation.
I thought it was the same as python, and I was addicted to the problem that the ruby interpreter did not start.
Note that cibuildwheel builds python bwheel (including C ++ module) in bulk with CI and uploads it to PyPI
[Python] Precautions when retrieving data by scraping and putting it in the list
[Fundamental Information Technology Engineer Examination] I wrote a linear search algorithm in Python.
Talking about the features that pandas and I were in charge of in the project
I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
Somehow the code I wrote worked and I was impressed, so I will post it
I wrote a PyPI module that extends the parameter style in Python's sqlite3 module
I want to get the file name, line number, and function name in Python 3.4
Examples and solutions that the Python version specified in pyenv does not run
[Python] I introduced Word2Vec and played with it.
When I try matplotlib in Python, it says'cairo.Context'
I tried simulating the "birthday paradox" in Python
I wrote "Introduction to Effect Verification" in Python
About the difference between "==" and "is" in python
[Note] About the role of underscore "_" in Python
I wrote the hexagonal architecture in go language
I implemented the inverse gamma function in python
POST JSON in Python and receive it in PHP
The one that displays the progress bar in Python
I want to display the progress in Python!