Introduction

I'm studying machine learning. I will put it together for myself. I don't think it's mathematically rigorous. It is theory-centric. In this article, we will do "straight fitting" and "curve fitting".

Straight fitting

Requirements

I want to find a straight line that applies to $ N $ data $ \ {(x_1, t_1), \ cdots, (x_N, t_N) \} $.

Model definition

Let the prediction straight line (model) be $ y = w_0 + w_1x $.

Definition of objective function

Focus on some data $ (x_i, t_i) $. The predicted value of $ x_n $ is $ w_0 + w_1x_n $. The square of the difference from $ t_n $ is $ (w_0 + w_1x_n-t_n) ^ 2 $. The sum of these for all data is the objective function. E(w_0,w_1)=\frac{1}{2}\sum_{n=1}^N(w_0+w_1x_n-t_n)^2 $ \ frac {1} {2} $ is for making the expression partially differentiated with $ w_0, w_1 $ look beautiful.

Minimize the objective function

Partially differentiate with $ w_0, w_1 $ to minimize the objective function and set $ = 0 $ A simultaneous equation consisting of the following two equations can be created. \frac{\partial}{\partial w_0}E=\sum_{n=1}^N(w_0+w_1x_n-t_n)=0 \frac{\partial}{\partial w_1}E=\sum_{n=1}^Nx_n(w_0+w_1x_n-t_n)=0 When this is solved, $ w_0 and w_1 $ are obtained, and the prediction straight line is determined. (Not solved here, because straight fitting is included in curve fitting.)

Curve fitting

Requirements

The requirements are almost the same as for "straight fitting". I want to find a curve that applies to $ N $ data $ \ {(x_1, t_1), \ cdots, (x_N, t_N) \} $.

Model definition

Let the prediction curve (model) be $ y = w_0 + w_1x + w_2x ^ 2 + \ cdots + w_ {D-1} x ^ {D-1} $.

Definition of objective function

Focus on some data $ (x_i, t_i) $. The predicted value of $ x_n $ is $ w_0 + w_1x_n + w_2x_n ^ 2 + \ cdots + w_ {D-1} x_n ^ {D-1} $. The square of the difference from $ t_n $ is $ (w_0 + w_1x_n + w_2x_n ^ 2 + \ cdots + w_ {D-1} x_n ^ {D-1} -t_n) ^ 2 $. The sum of these for all data is the objective function. E(w_0,\cdots,w_{D-1})=\frac{1}{2}\sum_{n=1}^N(w_0+w_1x_n+w_2x_n^2+\cdots+w_{D-1}x_n^{D-1}-t_n)^2 $ \ frac {1} {2} $ is for making the expression partially differentiated with $ w_0, \ cdots, w_ {D-1} $ look beautiful.

Concisely represent the model and objective function

{\bf w}=(w_0, w_1, w_2, \cdots, w_{D-1})^T、 \phi_i(x)=x^i、 {\boldsymbol\phi}(x)=(\phi_0(x), \phi_1(x), \phi_2(x), \cdots, \phi_D(x))^T

Then, the model can be expressed as $ y = {\ bf w} ^ T \ boldsymbol \ phi (x) $.

Also, the objective function is $ E ({\ bf w}) = \ frac {1} {2} \ sum_ {n = 1} ^ N ({\ bf w} ^ T \ boldsymbol \ phi (x_n) -t_n) Can be written as ^ 2 $.

In addition, $ \ Phi = \ begin {pmatrix} \boldsymbol\phi(x_1)^T \
\vdots \
\boldsymbol\phi(x_D)^T\
\end{pmatrix},{\bf t}=(t_1,\cdots,t_N)^T$ If you leave The objective function isE({\bf w})=\frac{1}{2}||\boldsymbol\Phi{\bf w}-{\bf t}||^2Can be written.

Minimize the objective function

Partially differentiate with $ {\ bf w} $ to minimize the objective function to $ = {\ bf 0} $ Solving this will determine $ {\ bf w} $ and the prediction curve. Derived below

\begin{eqnarray}
\frac{\partial}{\partial {\bf w}}E&=&\frac{\partial}{\partial {\bf w}}\frac{1}{2}||\boldsymbol\Phi{\bf w}-{\bf t}||^2\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}(\boldsymbol\Phi{\bf w}-{\bf t})^T(\boldsymbol\Phi{\bf w}-{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T-{\bf t}^T)(\boldsymbol\Phi{\bf w}-{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-{\bf w}^T\boldsymbol\Phi^T{\bf t}-{\bf t}^T\boldsymbol\Phi{\bf w}+{\bf t}^T{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-2{\bf w}^T\boldsymbol\Phi^T{\bf t}+||{\bf t}||^2)\\
&=&\frac{1}{2}(2\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-2\boldsymbol\Phi^T{\bf t})\\
&=&\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t}\\
&&={\bf 0}Then\\
&&\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t}={\bf 0}\\
&&\Leftrightarrow \boldsymbol\Phi^T\boldsymbol\Phi{\bf w}=\boldsymbol\Phi^T{\bf t}\\
&&\Leftrightarrow {\bf w}=(\boldsymbol\Phi^T\boldsymbol\Phi)^{-1}\boldsymbol\Phi^T{\bf t}\\
&&\boldsymbol\Phi^T\boldsymbol\Phi is regular.
\end{eqnarray}

Source

You can run it on Google Colab from here .

###################################
#Linear regression-Curve fitting
###################################
import numpy as np
import matplotlib.pyplot as plt

#Find the matrix φ
def get_phi_matrix(N, M, x):
    Phi = np.zeros((N, M))    #N rows and M columns(Not initialized)
    for i in range(M):
        Phi[:, i] = x.T ** i  #Phi to store one column at a time[:, i].shape is(N,)So we need to transpose x
    return Phi;

#Fixed random numbers
np.random.seed(1)
#Maximum power of polynomial(w0 * x^0+...+wM-1*x^M-1)
M = 4
#Number of training data
N = 10

#Training data column vector(Representing a column vector as an N-by-1 matrix)
x = np.linspace(0, 1, N).reshape(N, 1)

#Column vector of training data t
t = np.sin(2 * np.pi * x.T) + np.random.normal(0, 0.2, N) #Give sin an error that follows a normal distribution
t = t.T  #Make it a column vector(Actually an N-by-1 matrix) t = t.reshape(N, 1)But OK

#Find the matrix φ
Phi = get_phi_matrix(N, M, x);

#Find w analytically
w = np.linalg.inv(Phi.T @ Phi) @ Phi.T @ t

#Based on the obtained coefficient w, find the predicted value y for the new input x2.
x2 = np.linspace(0, 1, 100).reshape(100, 1) #100 points from 0 to 1, evenly spaced(1/99)Generate in

#Create matrix Phi2
Phi2 = get_phi_matrix(100, M, x2)

#Predict y= w0*x^0 + ... + wM-1*x^M-1
y = Phi2 @ w     #No regularization

#View results
plt.ylim(-0.1, 1.1) #Limit the display range of x
plt.ylim(-1.1, 1.1) #Limit the display range of y
plt.scatter(x, t, color="red")  #Display of training data
plt.plot(x2, y, color="blue")  #Draw a prediction curve without regularization
plt.show()  #Graph drawing

The objective function is a convex function

To show that the objective function is a convex function, it is sufficient to show that the Hessian matrix of the objective function is a semi-positive definite matrix. Proof below

First, find the Hessian matrix of the objective function.

\begin{eqnarray}
\frac{\partial}{\partial {\bf w}}\left(\frac{\partial}{\partial {\bf w}}E\right)&=&\frac{\partial}{\partial {\bf w}}(\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t})\\
&=&\boldsymbol\Phi^T\boldsymbol\Phi \\
\end{eqnarray}\\

Next, we show that the Hessian matrix is a semi-positive definite matrix.

{\bf x}\in\mathbb{R}^Let D\\
\begin{eqnarray}
{\bf x}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf x}&=&(\boldsymbol\Phi{\bf x})^T(\boldsymbol\Phi{\bf x})\\
&=&||\boldsymbol\Phi{\bf x}||^2\geq0\\
\end{eqnarray}\\

It was shown that the Hessian matrix of the objective function is a semi-positive definite matrix. Therefore, the objective function is a convex function.

[PYTHON] [Machine learning] Linear regression-Linear fitting, curve fitting

Introduction

Straight fitting

Requirements

Model definition

Definition of objective function

Minimize the objective function

Curve fitting

Requirements

Model definition

Definition of objective function

Concisely represent the model and objective function

Minimize the objective function

Source

The objective function is a convex function