[PYTHON] Kernel regression with Numpy only

20190426185334.png

The kernel method is a powerful weapon for nonlinear data analysis. It frequently appears as a basic element of algorithms such as soft margin SVM, Gaussian process, and clustering. In this post, I tried to reproduce the procedure for solving a regression problem using the kernel method in Python.

TL;DR Please check this repository and see codes.

<img src = "https://images-fe.ssl-images-amazon." com / images / I / 51xaudRn0LL.SL160.jpg "class =" hatena-asin-detail-image "alt =" Kernel Multivariate Analysis-New Developments in Nonlinear Data Analysis (Series Probability and Information Science) "title =" Kernel Multivariate Analysis-A New Development of Nonlinear Data Analysis (Series Probability and Information Science) "> <img src = "https://images-fe.ssl-images-amazon." com / images / I / 41QTf1ofh-L.SL160.jpg "class =" hatena-asin-detail-image "alt =" Introduction to Kernel Method-Data Analysis with Positive Value Kernel (Series Multivariate Data Statistical Science) "title = "Introduction to the Kernel Method-Data Analysis with a Positive Value Kernel (Series Multivariate Data Statistical Science)">


What is kernel regression?

Kernel functions and gram matrix

$ n $ data sample $ \ mathcal {D} = \ left \ {x ^ {(i)}, y ^ {(i)} \ right \} , (i = 1, \ dots n) Approximate the function $ y = f (x) $ from $. The regression model is expressed by the following equation using the kernel function $ k (\ cdot, \ cdot) $ and the parameter $ {\ alpha} _i , (i = 1 \ dots n) $.

\hat{y}^{(i)} = \sum\_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) ~~~~ (i=1 \dots n)

The distance between the estimated value $ \ hat {y} ^ {(i)} $ and the true value $ y ^ {(i)} $ is calculated as the residual. Fit the model to the data sample, derive the sum (or average) of the residuals as the loss function of the model, and find the optimal parameters $ {\ alpha} _i , (i = 1 \ dots n) $.

\begin{align} residual(y^{(i)}, \hat{y}^{(i)}) &= y^{(i)} - \hat{y}^{(i)} \\\ &= y^{(i)} - \sum\_{j=1}^{n} \alpha\_j k(x^{(i)}, x^{(j)}) \end{align}
\begin{align} L(\alpha) &= \frac{1}{n} \sum\_{i=1}^{n} {residual(y^{(i)}, \hat{y}^{(i)}) }^{2} \\\ &= \frac{1}{n} \sum\_{i=1}^{n} { \left( y^{(i)} - \sum\_{j=1}^{n} \alpha\_j k(x^{(i)}, x^{(j)}) \right) }^{2} \\\ &= {({\bf y} - {\bf K} {\bf \alpha})}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \end{align}
where, \,\,\, {\bf K} = \begin{pmatrix} k(x^{(1)}, x^{(1)}) & \cdots & k(x^{(1)}, x^{(n)}) \\\ \vdots & \ddots & \vdots \\\ k(x^{(n)}, x^{(1)}) & \cdots & k(x^{(n)}, x^{(n)}) \end{pmatrix}

$ {\ bf K} $ is a matrix that is calculated from a data sample given a kernel function, and is called a Gram Matrix.

Parameter search by gradient descent

The naive gradient descent method is used to search for the parameter $ {\ alpha} \ _ i , (i = 1 \ dots n) $. The coefficient $ \ gamma $ is the learning rate.

\frac{\partial L(\alpha)}{\partial {\bf \alpha}} = -2 \cdot {\bf K}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \\\ {\alpha}\_{new} = {\alpha}\_{old} + \gamma \cdot \frac{\partial L(\alpha)}{\partial {\bf \alpha}} \mid\_{ \alpha = {\alpha}\_{old} }

RBF kernel (Gaussian kernel) is adopted as the kernel function.

k(x, x') := \exp \left( {\large - \frac{{|| x - x' ||}^2}{2 {\sigma}^{2}} } \right)

Implement kernel regression in Python (numpy only)

All the code can be found in the repository below. https://github.com/yumaloop/KernelRegression

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Kernel Function
class RBFkernel():
    def __init__(self, sigma=0.5):
        self.sigma = sigma
        
    def __call__(self, x, y):
        numerator = -1 * np.sum((x - y)**2)
        denominator = 2 * (self.sigma**2)
        return np.exp(numerator / denominator)
    
    def get_params(self):
        return self.sigma
    
    def set_params(self, sigma):
        self.sigma = sigma

# Regression Model
class KernelRegression():
    def __init__(self, kernel):
        self.kernel = kernel
        
    def fit_kernel(self, X, y, lr=0.01, nb_epoch=1000, log_freq=50):
        self.X = X
        self.y = y
        self.n = X.shape[0] # sample size
        self.alpha = np.full(self.n, 1) # param alpha: initialize
        self.gram_matrix = np.zeros((self.n, self.n))
        
        # Gradient Descent Algorithm to optimize alpha
        for epoch in range(nb_epoch):
            
            # Gram Matrix
            for i in range(self.n):
                for j in range(self.n):
                    self.gram_matrix[i][j] = self.kernel(self.X[i], self.X[j])
                    self.loss, self.loss_grad = self.mse(self.X, self.y, self.alpha, self.gram_matrix)
                    self.alpha = self.alpha - lr * self.loss_grad
                    
            if epoch % log_freq == 0:
                print("epoch: {} \t MSE of sample data: {:.4f}".format(epoch, self.loss))
                        
                        
    def mse(self, X, y, alpha, gram_matrix):
        loss = np.dot((y - np.dot(gram_matrix, alpha)), (y - np.dot(gram_matrix, alpha)))
        loss_grad = -2 * np.dot(gram_matrix.T, (y - np.dot(gram_matrix, alpha)))
        return loss, loss_grad
    
    def predict(self, X_new):
        n_new = X_new.shape[0]
        y_new = np.zeros(n_new)
        for i in range(n_new):
            for j in range(self.n):
                y_new[i] += self.alpha[j] * self.kernel(X_new[i], self.X[j])
        return y_new
# Experiment

def actual_function(x):
    return 1.7 * np.sin(2 * x) + np.cos(1.5 * x) + 0.5 * np.cos(0.5 * x) + 0.5 * x  + 0.1

sample_size = 100 # the number of data sample
var_noise = 0.7 # variance of the gaussian noise for samples

# make data sample
x_sample = np.random.rand(sample_size) * 10 - 5
y_sample = actual_function(x_sample) + np.random.normal(0, var_noise, sample_size)

# variables for plot (actual function)
x_plot = np.linspace(-5, 5, 100)
y_plot = actual_function(x_plot)

# plot figure
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.4, color="blue", label="data sample")
plt.title("Actual function & Data sample (N={})".format(sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

20190426184220.png

# set kernel
sigma=0.2
kernel = RBFkernel(sigma=sigma)

# generate model
model = KernelRegression(kernel)

# fit data sample for the model
model.fit_kernel(x_sample, y_sample, lr=0.01, nb_epoch=1000, log_freq=100)
  epoch: 0      MSE of sample data: 35.2988
  epoch: 100 	 MSE of sample data: 30.3570
  epoch: 200 	 MSE of sample data: 29.4241
  epoch: 300 	 MSE of sample data: 28.8972
  epoch: 400 	 MSE of sample data: 28.5597
  epoch: 500 	 MSE of sample data: 28.3322
  epoch: 600 	 MSE of sample data: 28.1736
  epoch: 700 	 MSE of sample data: 28.0598
  epoch: 800 	 MSE of sample data: 27.9759
  epoch: 900 	 MSE of sample data: 27.9122

# check Gram Matrix of the model
print(model.gram_matrix)
  array([[1.00000000e+000, 3.24082380e-012, 1.86291046e-092, ...,
        3.45570112e-030, 7.50777061e-016, 6.18611768e-129],
       [3.24082380e-012, 1.00000000e+000, 5.11649994e-039, ...,
        1.78993799e-078, 1.05138357e-053, 1.15421467e-063],
       [1.86291046e-092, 5.11649994e-039, 1.00000000e+000, ...,
        6.88153992e-226, 4.47677881e-182, 8.98951607e-004],
       ...,
       [3.45570112e-030, 1.78993799e-078, 6.88153992e-226, ...,
        1.00000000e+000, 4.28581263e-003, 2.58161981e-281],
       [7.50777061e-016, 1.05138357e-053, 4.47677881e-182, ...,
        4.28581263e-003, 1.00000000e+000, 3.95135557e-232],
       [6.18611768e-129, 1.15421467e-063, 8.98951607e-004, ...,
        2.58161981e-281, 3.95135557e-232, 1.00000000e+000]])

# check loss
print(model.loss)
  12.333081499130776

# array for plot (predicted function)
x_new = np.linspace(-5, 5, 100)
y_new = model.predict(x_new)

 plot result
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.3, color="blue", label="data sample")
plt.plot(x_new, y_new, color="red", label="predicted function")
plt.title("Kernel Regression \n RBF kernel (sigma={}), sample size ={}".format(sigma, sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

20190426185334.png


that's all. Thank you for reading to the end!

Recommended Posts

Kernel regression with Numpy only
Regression analysis with NumPy
Implementing logistic regression with NumPy
"Gaussian process and machine learning" Gaussian process regression implemented only with Python numpy
Linear regression with statsmodels
Kernel Method with Python
Moving average with numpy
Regression with linear model
Getting Started with Numpy
Learn with Cheminformatics NumPy
Matrix concatenation with Numpy
Hamming code with numpy
Try regression with TensorFlow
Extend NumPy with Rust
I wrote GP with numpy
CNN implementation with just numpy
Artificial data generation with numpy
Multiple regression analysis with Keras
Self-build linux kernel with clang
[Python] Calculation method with numpy
Try matrix operation with NumPy
Ridge regression with Pyspark's Mllib
Diffusion equation animation with NumPy
Debt repayment simulation with numpy
Implemented SMO with Python + NumPy
Stick strings together with Numpy
Deep Kernel Learning with Pyro
Linear regression method using Numpy
Handle numpy arrays with f2py
[Python] Linear regression with scikit-learn
Use OpenBLAS with numpy, scipy
Python3 | Getting Started with numpy
Neural network implementation (NumPy only)
Robust linear regression with scikit-learn
Perform least squares fitting with numpy.
Function parameters with only an asterisk'*'
Draw a beautiful circle with numpy
Linear regression with Student's t distribution
Implement Keras LSTM feedforward with numpy
Extract multiple elements with Numpy array
Logistic regression analysis Self-made with python
Use a custom kernel with WSL2
Sine wave prediction (regression) with Pytorch
I tried to implement deep learning that is not deep with only NumPy