PRML Chapter 1 Bayesian Curve Fitting Python Implementation

I decided to pick up some of the methods introduced in PRML for my own study and implement them in Python. Basically, we will code with the rule that only numpy can be used other than the Python standard library (of which scipy may be used). However, matplotlib etc. are exceptions because they have nothing to do with the algorithm part.

Bayesian curve fitting

Bayesian curve fitting is a method of fitting curves by treating it completely like Bayesian. In the prediction distribution by maximum likelihood estimation, the variance is the same for all points, whereas in Bayesian curve fitting, the prediction distribution is obtained in a Bayesian manner (using the addition and multiplication theorem of probabilities) using the posterior distribution of parameters. You can calculate the variance for each point with. It will calculate if you are relatively confident about each point.

Implementation

As stated at the very beginning, the algorithm implementation uses numpy.

import matplotlib.pyplot as plt
import numpy as np

Calculation of design matrix

{\bf\Phi} = \begin{bmatrix} \phi_0(x_0) & \phi_1(x_0) & \cdots & \phi_{M-1}(x_0)\\\ \phi_0(x_1) & \phi_1(x_1) & \cdots & \phi_{M-1}(x_1)\\\ \vdots & \vdots & \ddots & \vdots\\\ \phi_0(x_{N-1}) & \phi_1(x_{N-1}) & \cdots & \phi_{M-1}(x_{N-1})\\\ \end{bmatrix}
class PolynomialFeatures(object):

    def __init__(self, degree):
        self.degree = degree

    def transform(self, x):
        features = [x ** i for i in xrange(self.degree + 1)]
        return np.array(features).transpose()

Bayesian curve fitting estimation and prediction

First, estimate the posterior distribution of the parameter $ {\ bf w} . $ p({\bf w}|{\bf x}, {\bf t}, \alpha, \beta) \propto p({\bf t}|{\bf\Phi}{\bf w}, \beta)p({\bf w}|\alpha) $$ Here, $ {\ bf x} and {\ bf t} $ are training sets. Also, $ p $ on the right side assumes a Gaussian function, and the precision parameters are $ \ alpha $ and $ \ beta , respectively. Then, the probability distribution on the left side also becomes a Gaussian distribution. $ p({\bf w}|{\bf x}, {\bf t}, \alpha, \beta) = \mathcal{N}({\bf m}, {\bf S}) $ However $ {\bf m} = \beta{\bf S}{\bf\Phi}^{\rm T}{\bf t},~{\bf S}^{-1} = \alpha{\bf I} + \beta{\bf\Phi}^{\rm T}{\bf\Phi} $$ Using this posterior distribution, the probability distribution of $ t $ for the new point $ x $ is $ p(t|x,{\bf x}, {\bf t}, \alpha, \beta) = \int p(t|x,{\bf w},\beta)p({\bf w}|{\bf x}, {\bf t}, \alpha, \beta){\rm d}{\bf w} $ This is also calculated to have a Gaussian distribution. $ p(t|x,{\bf x}, {\bf t}, \alpha, \beta) = \mathcal{N}(t|{\bf m}^{\rm T}{\bf\phi}(x), {1\over\beta}+{\bf\phi}(x)^{\rm T}{\bf S}{\bf\phi}(x)) $

class BayesianRegression(object):

    def __init__(self, alpha=0.1, beta=0.25):
        self.alpha = alpha
        self.beta = beta

    def fit(self, X, t):
        self.w_var = np.linalg.inv(
            self.alpha * np.identity(np.size(X, 1))
            + self.beta * X.T.dot(X))
        self.w_mean = self.beta * self.w_var.dot(X.T.dot(t))

    def predict(self, X):
        y = X.dot(self.w_mean)
        y_var = 1 / self.beta + np.sum(X.dot(self.w_var) * X, axis=1)
        y_std = np.sqrt(y_var)
        return y, y_std

Whole code

The code below is what I implemented including the above. The general flow is

  1. Create training data (x, t = create_toy_data (func, ...))
  2. Calculate the posterior distribution of the weight parameter $ {\ bf w} $ (regression.fit (X, t))
  3. Calculate the predicted distribution using the posterior distribution (y, y_std = regression.predict (X_test))
  4. Illustrate the results It has become.

bayesian_curve_fitting.py


import matplotlib.pyplot as plt
import numpy as np


class PolynomialFeatures(object):

    def __init__(self, degree):
        self.degree = degree

    def transform(self, x):
        features = [x ** i for i in xrange(self.degree + 1)]
        return np.array(features).transpose()


class BayesianRegression(object):

    def __init__(self, alpha=0.1, beta=0.25):
        self.alpha = alpha
        self.beta = beta

    def fit(self, X, t):
        self.w_var = np.linalg.inv(
            self.alpha * np.identity(np.size(X, 1))
            + self.beta * X.T.dot(X))
        self.w_mean = self.beta * self.w_var.dot(X.T.dot(t))

    def predict(self, X):
        y = X.dot(self.w_mean)
        y_var = 1 / self.beta + np.sum(X.dot(self.w_var) * X, axis=1)
        y_std = np.sqrt(y_var)
        return y, y_std


def create_toy_data(func, low=0, high=1, size=10, sigma=1.):
    x = np.random.uniform(low, high, size)
    t = func(x) + np.random.normal(scale=sigma, size=size)
    return x, t


def main():

    def func(x):
        return np.sin(2 * np.pi * x)

    x, t = create_toy_data(func, low=0, high=1, size=10, sigma=0.25)
    plt.scatter(x, t, s=50, marker='o', alpha=0.5, label="observation")

    features = PolynomialFeatures(degree=5)

    regression = BayesianRegression(alpha=1e-3, beta=2)
    X = features.transform(x)
    regression.fit(X, t)

    x_test = np.linspace(0, 1, 100)
    X_test = features.transform(x_test)
    y, y_std = regression.predict(X_test)

    plt.plot(x_test, func(x_test), color='blue', label="sin($2\pi x$)")
    plt.plot(x_test, y, color='red', label="predict_mean")
    plt.fill_between(x_test, y - y_std, y + y_std,
                     color='pink', alpha=0.5, label="predict_std")
    plt.legend()
    plt.title("Predictive distribution")
    plt.xlabel("x")
    plt.ylabel("t")
    plt.show()


if __name__ == '__main__':
    main()

result

Bayesian curve fitting was performed from the data points (blue points) generated by adding noise to the function $ {\ rm sin} (2 \ pi x) $. The mean value of the predicted distribution at each $ x $ is shown in red, and the standard deviation is shown in pink. The width of the pink band corresponds to the uncertainty of the prediction. If the prediction distribution by maximum likelihood estimation is calculated, the width of the pink band will be constant regardless of $ x , and the prediction of any point will be unintuitive as well. In this result, I think that there are not many points ( x \ le0.1, 0.7 \ le x \ le0.9 $) in the area where I am relatively unsure of the prediction, but $ 0.4 \ le x \ le0.6 In the $ area, the standard deviation does not increase even if there are no data points. In addition, such behavior also depends on what kind of feature vector is used. This time, we used the features of polynomials, but other features using Gaussian functions are also often used. predictive_distribution.png

Finally

This time, I chose the hyperparameters $ \ alpha and \ beta $ that seemed to work, but it seems that the hyperparameters can be estimated only from the data points by using the method called evidence approximation in Chapter 3 of PRML. .. I'm thinking of implementing that too.

Recommended Posts

PRML Chapter 1 Bayesian Curve Fitting Python Implementation
PRML Chapter 4 Bayesian Logistic Regression Python Implementation
Implemented in Python PRML Chapter 1 Polynomial Curve Fitting
PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
PRML Chapter 5 Neural Network Python Implementation
PRML Chapter 3 Evidence Approximation Python Implementation
PRML Chapter 5 Mixed Density Network Python Implementation
PRML Chapter 9 Mixed Gaussian Distribution Python Implementation
PRML Chapter 14 Conditional Mixed Model Python Implementation
PRML Chapter 10 Variational Gaussian Distribution Python Implementation
PRML Chapter 6 Gaussian Process Regression Python Implementation
PRML Chapter 2 Student's t Distribution Python Implementation
Implemented in Python PRML Chapter 1 Bayesian Inference
PRML Chapter 11 Markov Chain Monte Carlo Python Implementation
[Python] Curve fitting with polynomial
PRML Chapter 7 Related Vector Machine Python Implementation for Regression Problems
PRML Chapter 13 Maximum Likelihood Estimating Python Implementation of Hidden Markov Models
PRML diagram drawing Figure 1.4 Polynomial curve fitting
PRML implementation Chapter 3 Linear basis function model
Implemented in Python PRML Chapter 7 Nonlinear SVM
Implemented in Python PRML Chapter 5 Neural Networks
Implemented in Python PRML Chapter 4 Classification by Perceptron Algorithm
A python implementation of the Bayesian linear regression class
Variational Bayesian inference implementation of topic model in python
RNN implementation in python
ValueObject implementation in Python
[Python] Chapter 01-01 About Python (First Python)
SVM implementation in python