I'm studying machine learning. I will put it together for myself. I don't think it's mathematically rigorous. It is theory-centric. In this article, we will do "straight fitting" and "curve fitting".
I want to find a straight line that applies to $ N $ data $ \ {(x_1, t_1), \ cdots, (x_N, t_N) \} $.
Let the prediction straight line (model) be $ y = w_0 + w_1x $.
Focus on some data $ (x_i, t_i) $.
The predicted value of $ x_n $ is $ w_0 + w_1x_n $.
The square of the difference from $ t_n $ is $ (w_0 + w_1x_n-t_n) ^ 2 $.
The sum of these for all data is the objective function.
Partially differentiate with $ w_0, w_1 $ to minimize the objective function and set $ = 0 $
A simultaneous equation consisting of the following two equations can be created.
The requirements are almost the same as for "straight fitting".
I want to find a curve that applies to $ N $ data $ \ {(x_1, t_1), \ cdots, (x_N, t_N) \} $.
Let the prediction curve (model) be $ y = w_0 + w_1x + w_2x ^ 2 + \ cdots + w_ {D-1} x ^ {D-1} $.
Focus on some data $ (x_i, t_i) $.
The predicted value of $ x_n $ is $ w_0 + w_1x_n + w_2x_n ^ 2 + \ cdots + w_ {D-1} x_n ^ {D-1} $.
The square of the difference from $ t_n $ is $ (w_0 + w_1x_n + w_2x_n ^ 2 + \ cdots + w_ {D-1} x_n ^ {D-1} -t_n) ^ 2 $.
The sum of these for all data is the objective function.
Then, the model can be expressed as $ y = {\ bf w} ^ T \ boldsymbol \ phi (x) $.
Also, the objective function is $ E ({\ bf w}) = \ frac {1} {2} \ sum_ {n = 1} ^ N ({\ bf w} ^ T \ boldsymbol \ phi (x_n) -t_n) Can be written as ^ 2 $.
In addition, $ \ Phi = \ begin {pmatrix}
\boldsymbol\phi(x_1)^T \
\vdots \
\boldsymbol\phi(x_D)^T\
\end{pmatrix},{\bf t}=(t_1,\cdots,t_N)^T$
If you leave
The objective function is
Partially differentiate with $ {\ bf w} $ to minimize the objective function to $ = {\ bf 0} $ Solving this will determine $ {\ bf w} $ and the prediction curve. Derived below
\begin{eqnarray}
\frac{\partial}{\partial {\bf w}}E&=&\frac{\partial}{\partial {\bf w}}\frac{1}{2}||\boldsymbol\Phi{\bf w}-{\bf t}||^2\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}(\boldsymbol\Phi{\bf w}-{\bf t})^T(\boldsymbol\Phi{\bf w}-{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T-{\bf t}^T)(\boldsymbol\Phi{\bf w}-{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-{\bf w}^T\boldsymbol\Phi^T{\bf t}-{\bf t}^T\boldsymbol\Phi{\bf w}+{\bf t}^T{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-2{\bf w}^T\boldsymbol\Phi^T{\bf t}+||{\bf t}||^2)\\
&=&\frac{1}{2}(2\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-2\boldsymbol\Phi^T{\bf t})\\
&=&\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t}\\
&&={\bf 0}Then\\
&&\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t}={\bf 0}\\
&&\Leftrightarrow \boldsymbol\Phi^T\boldsymbol\Phi{\bf w}=\boldsymbol\Phi^T{\bf t}\\
&&\Leftrightarrow {\bf w}=(\boldsymbol\Phi^T\boldsymbol\Phi)^{-1}\boldsymbol\Phi^T{\bf t}\\
&&\boldsymbol\Phi^T\boldsymbol\Phi is regular.
\end{eqnarray}
You can run it on Google Colab from here .
###################################
#Linear regression-Curve fitting
###################################
import numpy as np
import matplotlib.pyplot as plt
#Find the matrix φ
def get_phi_matrix(N, M, x):
Phi = np.zeros((N, M)) #N rows and M columns(Not initialized)
for i in range(M):
Phi[:, i] = x.T ** i #Phi to store one column at a time[:, i].shape is(N,)So we need to transpose x
return Phi;
#Fixed random numbers
np.random.seed(1)
#Maximum power of polynomial(w0 * x^0+...+wM-1*x^M-1)
M = 4
#Number of training data
N = 10
#Training data column vector(Representing a column vector as an N-by-1 matrix)
x = np.linspace(0, 1, N).reshape(N, 1)
#Column vector of training data t
t = np.sin(2 * np.pi * x.T) + np.random.normal(0, 0.2, N) #Give sin an error that follows a normal distribution
t = t.T #Make it a column vector(Actually an N-by-1 matrix) t = t.reshape(N, 1)But OK
#Find the matrix φ
Phi = get_phi_matrix(N, M, x);
#Find w analytically
w = np.linalg.inv(Phi.T @ Phi) @ Phi.T @ t
#Based on the obtained coefficient w, find the predicted value y for the new input x2.
x2 = np.linspace(0, 1, 100).reshape(100, 1) #100 points from 0 to 1, evenly spaced(1/99)Generate in
#Create matrix Phi2
Phi2 = get_phi_matrix(100, M, x2)
#Predict y= w0*x^0 + ... + wM-1*x^M-1
y = Phi2 @ w #No regularization
#View results
plt.ylim(-0.1, 1.1) #Limit the display range of x
plt.ylim(-1.1, 1.1) #Limit the display range of y
plt.scatter(x, t, color="red") #Display of training data
plt.plot(x2, y, color="blue") #Draw a prediction curve without regularization
plt.show() #Graph drawing
To show that the objective function is a convex function, it is sufficient to show that the Hessian matrix of the objective function is a semi-positive definite matrix. Proof below
First, find the Hessian matrix of the objective function.
\begin{eqnarray}
\frac{\partial}{\partial {\bf w}}\left(\frac{\partial}{\partial {\bf w}}E\right)&=&\frac{\partial}{\partial {\bf w}}(\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t})\\
&=&\boldsymbol\Phi^T\boldsymbol\Phi \\
\end{eqnarray}\\
Next, we show that the Hessian matrix is a semi-positive definite matrix.
{\bf x}\in\mathbb{R}^Let D\\
\begin{eqnarray}
{\bf x}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf x}&=&(\boldsymbol\Phi{\bf x})^T(\boldsymbol\Phi{\bf x})\\
&=&||\boldsymbol\Phi{\bf x}||^2\geq0\\
\end{eqnarray}\\
It was shown that the Hessian matrix of the objective function is a semi-positive definite matrix. Therefore, the objective function is a convex function.