[PYTHON] [Machine learning] Linear regression-Linear fitting, curve fitting

Introduction

I'm studying machine learning. I will put it together for myself. I don't think it's mathematically rigorous. It is theory-centric. In this article, we will do "straight fitting" and "curve fitting".

Straight fitting

Requirements

I want to find a straight line that applies to $ N $ data $ \ {(x_1, t_1), \ cdots, (x_N, t_N) \} $.

image.png

Model definition

Let the prediction straight line (model) be $ y = w_0 + w_1x $.

Definition of objective function

Focus on some data $ (x_i, t_i) $. The predicted value of $ x_n $ is $ w_0 + w_1x_n $. The square of the difference from $ t_n $ is $ (w_0 + w_1x_n-t_n) ^ 2 $. The sum of these for all data is the objective function. E(w_0,w_1)=\frac{1}{2}\sum_{n=1}^N(w_0+w_1x_n-t_n)^2 $ \ frac {1} {2} $ is for making the expression partially differentiated with $ w_0, w_1 $ look beautiful.

Minimize the objective function

Partially differentiate with $ w_0, w_1 $ to minimize the objective function and set $ = 0 $ A simultaneous equation consisting of the following two equations can be created. \frac{\partial}{\partial w_0}E=\sum_{n=1}^N(w_0+w_1x_n-t_n)=0 \frac{\partial}{\partial w_1}E=\sum_{n=1}^Nx_n(w_0+w_1x_n-t_n)=0 When this is solved, $ w_0 and w_1 $ are obtained, and the prediction straight line is determined. (Not solved here, because straight fitting is included in curve fitting.)

Curve fitting

Requirements

The requirements are almost the same as for "straight fitting". I want to find a curve that applies to $ N $ data $ \ {(x_1, t_1), \ cdots, (x_N, t_N) \} $. image.png

Model definition

Let the prediction curve (model) be $ y = w_0 + w_1x + w_2x ^ 2 + \ cdots + w_ {D-1} x ^ {D-1} $.

Definition of objective function

Focus on some data $ (x_i, t_i) $. The predicted value of $ x_n $ is $ w_0 + w_1x_n + w_2x_n ^ 2 + \ cdots + w_ {D-1} x_n ^ {D-1} $. The square of the difference from $ t_n $ is $ (w_0 + w_1x_n + w_2x_n ^ 2 + \ cdots + w_ {D-1} x_n ^ {D-1} -t_n) ^ 2 $. The sum of these for all data is the objective function. E(w_0,\cdots,w_{D-1})=\frac{1}{2}\sum_{n=1}^N(w_0+w_1x_n+w_2x_n^2+\cdots+w_{D-1}x_n^{D-1}-t_n)^2 $ \ frac {1} {2} $ is for making the expression partially differentiated with $ w_0, \ cdots, w_ {D-1} $ look beautiful.

Concisely represent the model and objective function

{\bf w}=(w_0, w_1, w_2, \cdots, w_{D-1})^T\phi_i(x)=x^i{\boldsymbol\phi}(x)=(\phi_0(x), \phi_1(x), \phi_2(x), \cdots, \phi_D(x))^T

Then, the model can be expressed as $ y = {\ bf w} ^ T \ boldsymbol \ phi (x) $.

Also, the objective function is $ E ({\ bf w}) = \ frac {1} {2} \ sum_ {n = 1} ^ N ({\ bf w} ^ T \ boldsymbol \ phi (x_n) -t_n) Can be written as ^ 2 $.

In addition, $ \ Phi = \ begin {pmatrix} \boldsymbol\phi(x_1)^T \
\vdots \
\boldsymbol\phi(x_D)^T\
\end{pmatrix},{\bf t}=(t_1,\cdots,t_N)^T$ If you leave The objective function isE({\bf w})=\frac{1}{2}||\boldsymbol\Phi{\bf w}-{\bf t}||^2Can be written.

Minimize the objective function

Partially differentiate with $ {\ bf w} $ to minimize the objective function to $ = {\ bf 0} $ Solving this will determine $ {\ bf w} $ and the prediction curve. Derived below

\begin{eqnarray}
\frac{\partial}{\partial {\bf w}}E&=&\frac{\partial}{\partial {\bf w}}\frac{1}{2}||\boldsymbol\Phi{\bf w}-{\bf t}||^2\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}(\boldsymbol\Phi{\bf w}-{\bf t})^T(\boldsymbol\Phi{\bf w}-{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T-{\bf t}^T)(\boldsymbol\Phi{\bf w}-{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-{\bf w}^T\boldsymbol\Phi^T{\bf t}-{\bf t}^T\boldsymbol\Phi{\bf w}+{\bf t}^T{\bf t})\\
&=&\frac{1}{2}\frac{\partial}{\partial {\bf w}}({\bf w}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-2{\bf w}^T\boldsymbol\Phi^T{\bf t}+||{\bf t}||^2)\\
&=&\frac{1}{2}(2\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-2\boldsymbol\Phi^T{\bf t})\\
&=&\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t}\\
&&={\bf 0}Then\\
&&\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t}={\bf 0}\\
&&\Leftrightarrow \boldsymbol\Phi^T\boldsymbol\Phi{\bf w}=\boldsymbol\Phi^T{\bf t}\\
&&\Leftrightarrow {\bf w}=(\boldsymbol\Phi^T\boldsymbol\Phi)^{-1}\boldsymbol\Phi^T{\bf t}\\
&&\boldsymbol\Phi^T\boldsymbol\Phi is regular.
\end{eqnarray}
Source

You can run it on Google Colab from here .

###################################
#Linear regression-Curve fitting
###################################
import numpy as np
import matplotlib.pyplot as plt

#Find the matrix φ
def get_phi_matrix(N, M, x):
    Phi = np.zeros((N, M))    #N rows and M columns(Not initialized)
    for i in range(M):
        Phi[:, i] = x.T ** i  #Phi to store one column at a time[:, i].shape is(N,)So we need to transpose x
    return Phi;

#Fixed random numbers
np.random.seed(1)
#Maximum power of polynomial(w0 * x^0+...+wM-1*x^M-1)
M = 4
#Number of training data
N = 10

#Training data column vector(Representing a column vector as an N-by-1 matrix)
x = np.linspace(0, 1, N).reshape(N, 1)

#Column vector of training data t
t = np.sin(2 * np.pi * x.T) + np.random.normal(0, 0.2, N) #Give sin an error that follows a normal distribution
t = t.T  #Make it a column vector(Actually an N-by-1 matrix) t = t.reshape(N, 1)But OK

#Find the matrix φ
Phi = get_phi_matrix(N, M, x);

#Find w analytically
w = np.linalg.inv(Phi.T @ Phi) @ Phi.T @ t

#Based on the obtained coefficient w, find the predicted value y for the new input x2.
x2 = np.linspace(0, 1, 100).reshape(100, 1) #100 points from 0 to 1, evenly spaced(1/99)Generate in

#Create matrix Phi2
Phi2 = get_phi_matrix(100, M, x2)

#Predict y= w0*x^0 + ... + wM-1*x^M-1
y = Phi2 @ w     #No regularization

#View results
plt.ylim(-0.1, 1.1) #Limit the display range of x
plt.ylim(-1.1, 1.1) #Limit the display range of y
plt.scatter(x, t, color="red")  #Display of training data
plt.plot(x2, y, color="blue")  #Draw a prediction curve without regularization
plt.show()  #Graph drawing
The objective function is a convex function

To show that the objective function is a convex function, it is sufficient to show that the Hessian matrix of the objective function is a semi-positive definite matrix. Proof below

First, find the Hessian matrix of the objective function.

\begin{eqnarray}
\frac{\partial}{\partial {\bf w}}\left(\frac{\partial}{\partial {\bf w}}E\right)&=&\frac{\partial}{\partial {\bf w}}(\boldsymbol\Phi^T\boldsymbol\Phi{\bf w}-\boldsymbol\Phi^T{\bf t})\\
&=&\boldsymbol\Phi^T\boldsymbol\Phi \\
\end{eqnarray}\\

Next, we show that the Hessian matrix is a semi-positive definite matrix.

{\bf x}\in\mathbb{R}^Let D\\
\begin{eqnarray}
{\bf x}^T\boldsymbol\Phi^T\boldsymbol\Phi{\bf x}&=&(\boldsymbol\Phi{\bf x})^T(\boldsymbol\Phi{\bf x})\\
&=&||\boldsymbol\Phi{\bf x}||^2\geq0\\
\end{eqnarray}\\

It was shown that the Hessian matrix of the objective function is a semi-positive definite matrix. Therefore, the objective function is a convex function.

Recommended Posts

[Machine learning] Linear regression-Linear fitting, curve fitting
Machine learning linear regression
Machine Learning: Supervised --Linear Regression
Machine learning beginners try linear regression
Machine Learning: Supervised --Linear Discriminant Analysis
<Course> Machine Learning Chapter 1: Linear Regression Model
Machine learning algorithm (linear regression summary & regularization)
[Memo] Machine learning
Machine learning classification
Machine Learning sample
EV3 x Python Machine Learning Part 2 Linear Regression
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Machine learning support vector machine
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Studying Machine Learning ~ matplotlib ~
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Somehow learn machine learning
Machine learning / data preprocessing
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
An introduction to machine learning
Machine learning / classification related techniques
Basics of Machine Learning (Notes)
Machine learning beginners tried RBM
[Machine learning] Understanding random forest
Machine learning with Python! Preparation
Machine Learning Study Resource Notepad
Machine learning ② Naive Bayes Summary
Machine learning article summary (self-authored)
About machine learning mixed matrices
Machine Learning: Supervised --Random Forest
Practical machine learning system memo
Machine learning Minesweeper with PyTorch
Build a machine learning environment
Python Machine Learning Programming> Keywords
Machine learning algorithm (simple perceptron)
Used in machine learning EDA
[Python] Curve fitting with polynomial
Importance of machine learning datasets
Machine learning and mathematical optimization
Machine Learning: Supervised --Support Vector Machine
Supervised machine learning (classification / regression)
I implemented Extreme learning machine
Beginning with Python machine learning
Machine learning algorithm (support vector machine)
Super introduction to machine learning
4 [/] Four Arithmetic by Machine Learning
Machine learning ④ K-nearest neighbor Summary
Pokemon machine learning Nth decoction
Try machine learning with Kaggle
Machine learning stacking template (regression)
Machine Learning: Supervised --Decision Tree
Machine learning algorithm (logistic regression)