[PYTHON] Kernel-Regression nur mit Numpy

Die Kernel-Methode ist eine leistungsstarke Waffe für die nichtlineare Datenanalyse. Es erscheint häufig als Grundelement von Algorithmen wie SVM mit weichem Rand, Gauß-Prozess und Clustering. In diesem Beitrag habe ich versucht, das Verfahren zur Lösung des Regressionsproblems mithilfe der Kernel-Methode in Python zu reproduzieren.

TL;DR Please check this repository and see codes.

Informationen zur Kernel-Methode finden Sie in den folgenden Büchern.

Was ist Kernel-Regression?

Kernelfunktion und Grammmatrix

$ n $ Datenbeispiel $ \ mathcal {D} = \ left \ {x ^ {(i)}, y ^ {(i)} \ right \} , (i = 1, \ dots n) Nähern Sie von $ aus die Funktion $ y = f (x) $. Das Regressionsmodell wird durch die folgende Gleichung unter Verwendung der Kernelfunktion $ k (\ cdot, \ cdot) $ und des Parameters $ {\ alpha} _i , (i = 1 \ dots n) $ ausgedrückt.

\hat{y}^{(i)} = \sum\_{j=1}^{n} \alpha_j k(x^{(i)}, x^{(j)}) ~~~~ (i=1 \dots n)

Der Abstand zwischen dem geschätzten Wert $ \ hat {y} ^ {(i)} $ und dem wahren Wert $ y ^ {(i)} $ wird als Residuum berechnet. Passen Sie das Modell an die Datenstichprobe an, leiten Sie die Summe (oder den Durchschnitt) der Residuen als Verlustfunktion des Modells ab und finden Sie die optimalen Parameter $ {\ alpha} _i , (i = 1 \ dots n) $.

\begin{align} residual(y^{(i)}, \hat{y}^{(i)}) &= y^{(i)} - \hat{y}^{(i)} \\\ &= y^{(i)} - \sum\_{j=1}^{n} \alpha\_j k(x^{(i)}, x^{(j)}) \end{align}

\begin{align} L(\alpha) &= \frac{1}{n} \sum\_{i=1}^{n} {residual(y^{(i)}, \hat{y}^{(i)}) }^{2} \\\ &= \frac{1}{n} \sum\_{i=1}^{n} { \left( y^{(i)} - \sum\_{j=1}^{n} \alpha\_j k(x^{(i)}, x^{(j)}) \right) }^{2} \\\ &= {({\bf y} - {\bf K} {\bf \alpha})}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \end{align}

where, \,\,\, {\bf K} = \begin{pmatrix} k(x^{(1)}, x^{(1)}) & \cdots & k(x^{(1)}, x^{(n)}) \\\ \vdots & \ddots & \vdots \\\ k(x^{(n)}, x^{(1)}) & \cdots & k(x^{(n)}, x^{(n)}) \end{pmatrix}

$ {\ bf K} $ ist eine Matrix, die aus einer Datenprobe mit einer Kernelfunktion berechnet und als Gramm-Matrix bezeichnet wird.

Parametersuche nach Gradientenabstiegsmethode

Die Methode des naiven Gradientenabstiegs wird verwendet, um nach dem Parameter $ {\ alpha} \ _ i , (i = 1 \ dots n) $ zu suchen. Der Koeffizient $ \ gamma $ ist die Lernrate.

\frac{\partial L(\alpha)}{\partial {\bf \alpha}} = -2 \cdot {\bf K}^{\mathrm{T}} ({\bf y} - {\bf K} {\bf \alpha}) \\\ {\alpha}\_{new} = {\alpha}\_{old} + \gamma \cdot \frac{\partial L(\alpha)}{\partial {\bf \alpha}} \mid\_{ \alpha = {\alpha}\_{old} }

Der RBF-Kernel (Gaußscher Kernel) wird als Kernelfunktion übernommen.

k(x, x') := \exp \left( {\large - \frac{{|| x - x' ||}^2}{2 {\sigma}^{2}} } \right)

Implementieren Sie die Kernel-Regression in Python (nur numpy)

Der gesamte Code befindet sich im folgenden Repository. https://github.com/yumaloop/KernelRegression

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Kernel Function
class RBFkernel():
    def __init__(self, sigma=0.5):
        self.sigma = sigma
        
    def __call__(self, x, y):
        numerator = -1 * np.sum((x - y)**2)
        denominator = 2 * (self.sigma**2)
        return np.exp(numerator / denominator)
    
    def get_params(self):
        return self.sigma
    
    def set_params(self, sigma):
        self.sigma = sigma

# Regression Model
class KernelRegression():
    def __init__(self, kernel):
        self.kernel = kernel
        
    def fit_kernel(self, X, y, lr=0.01, nb_epoch=1000, log_freq=50):
        self.X = X
        self.y = y
        self.n = X.shape[0] # sample size
        self.alpha = np.full(self.n, 1) # param alpha: initialize
        self.gram_matrix = np.zeros((self.n, self.n))
        
        # Gradient Descent Algorithm to optimize alpha
        for epoch in range(nb_epoch):
            
            # Gram Matrix
            for i in range(self.n):
                for j in range(self.n):
                    self.gram_matrix[i][j] = self.kernel(self.X[i], self.X[j])
                    self.loss, self.loss_grad = self.mse(self.X, self.y, self.alpha, self.gram_matrix)
                    self.alpha = self.alpha - lr * self.loss_grad
                    
            if epoch % log_freq == 0:
                print("epoch: {} \t MSE of sample data: {:.4f}".format(epoch, self.loss))
                        
                        
    def mse(self, X, y, alpha, gram_matrix):
        loss = np.dot((y - np.dot(gram_matrix, alpha)), (y - np.dot(gram_matrix, alpha)))
        loss_grad = -2 * np.dot(gram_matrix.T, (y - np.dot(gram_matrix, alpha)))
        return loss, loss_grad
    
    def predict(self, X_new):
        n_new = X_new.shape[0]
        y_new = np.zeros(n_new)
        for i in range(n_new):
            for j in range(self.n):
                y_new[i] += self.alpha[j] * self.kernel(X_new[i], self.X[j])
        return y_new

# Experiment

def actual_function(x):
    return 1.7 * np.sin(2 * x) + np.cos(1.5 * x) + 0.5 * np.cos(0.5 * x) + 0.5 * x  + 0.1

sample_size = 100 # the number of data sample
var_noise = 0.7 # variance of the gaussian noise for samples

# make data sample
x_sample = np.random.rand(sample_size) * 10 - 5
y_sample = actual_function(x_sample) + np.random.normal(0, var_noise, sample_size)

# variables for plot (actual function)
x_plot = np.linspace(-5, 5, 100)
y_plot = actual_function(x_plot)

# plot figure
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.4, color="blue", label="data sample")
plt.title("Actual function & Data sample (N={})".format(sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

# set kernel
sigma=0.2
kernel = RBFkernel(sigma=sigma)

# generate model
model = KernelRegression(kernel)

# fit data sample for the model
model.fit_kernel(x_sample, y_sample, lr=0.01, nb_epoch=1000, log_freq=100)
  epoch: 0      MSE of sample data: 35.2988
  epoch: 100 	 MSE of sample data: 30.3570
  epoch: 200 	 MSE of sample data: 29.4241
  epoch: 300 	 MSE of sample data: 28.8972
  epoch: 400 	 MSE of sample data: 28.5597
  epoch: 500 	 MSE of sample data: 28.3322
  epoch: 600 	 MSE of sample data: 28.1736
  epoch: 700 	 MSE of sample data: 28.0598
  epoch: 800 	 MSE of sample data: 27.9759
  epoch: 900 	 MSE of sample data: 27.9122

# check Gram Matrix of the model
print(model.gram_matrix)
  array([[1.00000000e+000, 3.24082380e-012, 1.86291046e-092, ...,
        3.45570112e-030, 7.50777061e-016, 6.18611768e-129],
       [3.24082380e-012, 1.00000000e+000, 5.11649994e-039, ...,
        1.78993799e-078, 1.05138357e-053, 1.15421467e-063],
       [1.86291046e-092, 5.11649994e-039, 1.00000000e+000, ...,
        6.88153992e-226, 4.47677881e-182, 8.98951607e-004],
       ...,
       [3.45570112e-030, 1.78993799e-078, 6.88153992e-226, ...,
        1.00000000e+000, 4.28581263e-003, 2.58161981e-281],
       [7.50777061e-016, 1.05138357e-053, 4.47677881e-182, ...,
        4.28581263e-003, 1.00000000e+000, 3.95135557e-232],
       [6.18611768e-129, 1.15421467e-063, 8.98951607e-004, ...,
        2.58161981e-281, 3.95135557e-232, 1.00000000e+000]])

# check loss
print(model.loss)
  12.333081499130776

# array for plot (predicted function)
x_new = np.linspace(-5, 5, 100)
y_new = model.predict(x_new)

 plot result
plt.plot(x_plot, y_plot, color="blue", linestyle="dotted", label="actual function")
plt.scatter(x_sample, y_sample, alpha=0.3, color="blue", label="data sample")
plt.plot(x_new, y_new, color="red", label="predicted function")
plt.title("Kernel Regression \n RBF kernel (sigma={}), sample size ={}".format(sigma, sample_size))
plt.legend(loc="upper left")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

das ist alles. Vielen Dank für das Lesen bis zum Ende!