Introduction

Thanks. This time I wrote an article about how to implement the least squares method in Python. As I mentioned earlier, I don't know if I don't have the following knowledge of mathematics.

Required knowledge

・ Basic functions such as linear and quadratic functions ・ Differentiation and partial differentiation ・ Total number (Σ), average

If you have knowledge of mathematics up to high school, there should be no problem. So let's see what it looks like.

What is the least squares method?

Suppose you have this kind of data. (Graph below) It is impossible to say, "Draw a linear function that minimizes the error from each point." The method that makes this possible is ** least squares (least squares) **. It can be used when the measurement data $ y $ is the sum of the model function (original function) $ f (x) $ and the error $ \ varepsilon $. The formula is as follows.

y = f(x) + \varepsilon

This means that you cannot use the least squares method with random data.

Now, since this data seems to be able to approximate a linear function, let's approximate it using it as a model function. The formula of the linear function is as follows.

f(x) = ax + b

Find the appropriate parameters for $ a $ and $ b $ using the least squares method. First, find the sum of the squares of the differences between the actual data, which is also the name, and the model function. The formula is as follows.

J=\sum_{i=1}^n(y_i-f(x_i))^2=\sum_{i=1}^n(y_i-ax_i-b)^2

The reason for finding the square of the error is to prevent it like a quadratic function because if there are both positive and negative errors, they cancel each other out and cannot be found accurately.

Partially differentiate the sum of the squares of the error $ J $ with respect to $ a $ and $ b $. The partial derivative is as follows.

\frac{\partial J}{\partial a}=-2\sum_{i=1}^nx_i(y_i-ax_i-b)=-2(\sum_{i=1}^n x_iy_i - a\sum_{i=1}^n x_i^2 - b\sum_{i=1}^n x_i)\tag{1}

\frac{\partial J}{\partial b}=-2\sum_{i=1}^n(y_i-ax_i-b)=-2(\sum_{i=1}^n y_i - a\sum_{i=1}^n x_i - nb)\tag{2}

Find $ a $ and $ b $ when these two equations are 0. First, transform equation (2) into the equation $ b = $.

b=\frac{1}{n}\sum_{i=1}^n y_i - \frac{a}{n}\sum_{i=1}^n x_i\tag{3}

Substitute equation (3) into equation (1) and make the variable $ a $ only into the equation $ a = $.

a=\frac{n\sum_{i=1}^n x_iy_i-\sum_{i=1}^n x_i \sum_{i=1}^n y_i}{n\sum_{i=1}^n x_i^2-(\sum_{i=1}^n x_i)^2}

After that, you can substitute this into equation (3) to find the parameters of $ a $ and $ b $ ... but Σ is too much and it's messed up. Let's put the letters so that they are a little easier to see.

a=\frac{nXY_{sum}-X_{sum}Y_{sum}}{nX^2_{sum}-(X_{sum})^2}\tag{4}

b=Y_{ave}-aX_{ave}\tag{5}

How about that. Is it a little easier to see? The meanings of the letters are as follows.

Sum of the products of $ XY_ {sum} $$ x $ and $ y $
Sum of $ X_ {sum} $$ x $
Sum of $ Y_ {sum} $$ y $
Sum of $ X ^ 2_ {sum} $ $ x ^ 2 $
Average of $ Y_ {ave} $$ y $
Average of $ X_ {ave} $$ x $

By the way, if you divide the numerator and denominator of equation (4) by $ n ^ 2 $, you can also find them all by the average of each.

a=\frac{XY_{ave}-X_{ave}Y_{ave}}{X^2_{ave}-(X_{ave})^2}\tag{6}

Now that we know the principle, let's calculate with Python.

Implemented in Python

For the data, we used the data from the graph shown first. The source code is shown below. In addition, ** numpy ** was used for the numerical calculation.

import numpy as np
import matplotlib.pyplot as plt

#Data to use
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Y = np.array([2, 5, 3, 7, 12, 11, 15, 16, 19])

plt.scatter(X, Y, color = 'red')

#Number of data
n = Y.size

#Product of X and Y
XY = X * Y

#Square of each element of X
X_2 = X * X

#Derivation of parameters
a = (n * np.sum(XY) - np.sum(X) * np.sum(Y)) / (n * np.sum(X_2) - (np.sum(X)**2))
b = np.average(Y) - a * np.average(X)

print('a', a)
print('b', b)

Y = a * X + b

plt.plot(X, Y)

plt.show()

Execution result

The result output to the command is shown below.

a 2.15
b -0.75

It seems that the slope is 2.15 and the intercept is -0.75. The graph is also shown below. As you can see, I was able to draw the linear function well. By the way, the change in error can be expressed as a two-variable function of $ J (a, b) $. The 3D graph is shown below. The red point coordinates $ (x, y, z) = (2.15, -0.75, 16.6) $ indicate the values of $ a $ and $ b $ obtained this time, and the value of the error $ J $. As you can see in the graph, this coordinate is the minimum value. For the time being, I will post the source code, so please run it and see for yourself.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = Axes3D(fig)

#Definition of J function
def Er(a, b, y_2, xy, y, x_2, x, n):
    return y_2-2*a*xy-2*b*y+a**2*x_2+2*a*b*x+n*b**2

#Data to use
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Y = np.array([2, 5, 3, 7, 12, 11, 15, 16, 19])

#Number of data
n = Y.size

#Product of X and Y
XY = X * Y

#Square of each element of X
X_2 = X * X

#Derivation of parameters
a = (n * np.sum(XY) - np.sum(X) * np.sum(Y)) / (n * np.sum(X_2) - (np.sum(X)**2))
b = np.average(Y) - a * np.average(X)

print(a)
print(b)

A = np.arange(a, a+0.1, 0.00001)
B = np.arange(b, b+0.1, 0.00001)

A, B = np.meshgrid(A, B)
J = Er(A, B, np.sum(Y*Y), np.sum(XY), np.sum(Y), np.sum(X_2), np.sum(X), n)

ax.plot_wireframe(A, B, J)

j = Er(a, b, np.sum(Y*Y), np.sum(XY), np.sum(Y), np.sum(X_2), np.sum(X), n)
print(j)
ax.scatter(a, b, j, color='red')

ax.set_xlabel("A")
ax.set_ylabel("B")
ax.set_zlabel("J")

plt.show()

Summary

This time it was a story that I tried to implement the least squares method in Python. The least squares method is one of the statistics, and it is a necessary field when studying AI. I'm currently studying, so I'll leave it as a memo again. If you have any questions, please leave a comment.