Overview

Hello, this is kwashi. Recently, with the popularity of machine learning, it has come to be applied to the industrial field, such as using prediction using a regression model for control. However, forecast uncertainty also becomes important in critical product contexts. For example, suppose you want to regress the parameters of a production device from the factory environment (temperature, etc.). However, even if a pattern that is not used for learning suddenly occurs, some parameters will be regressed. Therefore, the inference uncertainty of how reliable regression is is also important.

Bayesian statistics is an area of probability distribution that shows this uncertainty. Bayesian statistics are used in various fields such as autonomous driving and sound processing. This article provides an example of regression using Bayesian statistics. Regress the simple model sin function as the target. First, an example of polynomial regression is shown. However, in polynomials, the sin function could not be expressed without setting enough parameters. Therefore, we will develop a method to estimate the function itself using the Gaussian process.

Also, this article focuses on python's PyMC and Scipy libraries.

This article does not cover enough Bayesian statistics. I wrote this article with reference to the following books, so please read it. [Continued / Easy-to-understand pattern recognition-Introduction to unsupervised learning](https://www.amazon.co.jp/%E7%B6%9A%E3%83%BB%E3%82%8F%E3%81%8B% E3% 82% 8A% E3% 82% 84% E3% 81% 99% E3% 81% 84% E3% 83% 91% E3% 82% BF% E3% 83% BC% E3% 83% B3% E8% AA%8D%E8%AD%98%E2%80%95%E6%95%99%E5%B8%AB%E3%81%AA%E3%81%97%E5%AD%A6%E7%BF% 92% E5% 85% A5% E9% 96% 80% E2% 80% 95-% E7% 9F% B3% E4% BA% 95-% E5% 81% A5% E4% B8% 80% E9% 83% 8E / dp / 427421530X) Bayesian Analysis with Python

Polynomial regression of Sin function

In this chapter, data is generated according to a normal distribution with a mean sin (x) standard deviation of 0.2 at an angle x randomly extracted from a uniform distribution. It also uses the least squares method to fit a fifth-order polynomial. The fifth-order polynomial is:

f(x) = b_0 + b_1 x + b_2 x^2 + b_3 x^3 + b_4 x^4 + b_5 x^5

First, the results are shown below. The green line is the sin function, and the blue dot is the data used for learning. Also, red is the result of simply fitting a fifth-order polynomial by the method of least squares.

The program is as follows. Fitting by the least squares method is performed using scipy.


import numpy as np
from scipy.odr import odrpack as odr
from scipy.odr import models
import matplotlib.pyplot as plt
import theano.tensor as tt
import scipy.stats as stat
import pymc3 as pm

#Sin function
np.random.seed(5)
x = np.random.uniform(0, 10, size=20)
y = np.random.normal(np.sin(x), 0.2) #normal distribution, (average,Distributed)

#Fitting with scipy
polyFunc = models.polynomial(5) #5th
data = odr.Data(x, y)
myodr = odr.ODR(data, polyFunc, maxit=200)

myodr.set_job(fit_type=2) #Least squares

fit = myodr.run()

coeff = fit.beta[::-1]
err = fit.sd_beta[::-1]
print(coeff) #[-2.48756876e-05 -1.07573181e-02  2.09111045e-01 -1.21915241e+00, 2.11555200e+00 -1.08779827e-01] #x50

Bayesian polynomial regression

In this chapter, we will find a fifth-order polynomial in the Bayesian framework based on the data generated in the previous chapter. The model is like the program below

Mean generated from a polynomial with parameters generated from a normal distribution
Disperse the data generated from the Cauchy distribution
Degrees of freedom for the data generated from the exponential function (parameters that manipulate how the data is centered in the normal distribution)

Student's t distribution

{\rm St}(x|\mu,\lambda,\nu)={\Gamma(\nu/2+1/2)\over\Gamma(\nu/2)}\left({\lambda\over\pi\nu}\right)^{1/2}\left[1+{\lambda(x-\mu)^2\over\nu}\right]^{-\nu/2-1/2}

And. Then, the parameters are learned using the Markov chain Monte Carlo methods (MCMC).

with pm.Model() as model:
  b0 = pm.Normal('b0', mu=0, sd=10)
  b1 = pm.Normal('b1', mu=0, sd=5)
  b2 = pm.Normal('b2', mu=0, sd=5)
  b3 = pm.Normal('b3', mu=0, sd=5)
  b4 = pm.Normal('b4', mu=0, sd=5)
  b5 = pm.Normal('b5', mu=0, sd=5)

  #Cauchy distribution x ∈[0, ∞) https://docs.pymc.io/api/distributions/continuous.html#pymc3.distributions.continuous.HalfCauchy
  sd = pm.HalfCauchy('sd', 5)   
  mu = b0 + b1 * x + b2 * x**2 + b3 * x**3 + b4 * x**4 + b5 * x**5
  nu = pm.Exponential('nu', 1/30)

  y_pred = pm.StudentT('y_pred', mu=mu, sd=sd, nu=nu, observed=y)
  step = pm.Metropolis()
  trace = pm.sample(100000,step=step)

The polynomial using the estimated parameters is shown in red below. The green line is the sin function, the blue dot is the data used for training, and the blue is the result of the least squares method. In addition, the average variance obtained as a result of MCMC was 0.23, and the variance of the generative model could be estimated. However, it can be seen that the error is large from the actual sin function, and even in the part where there is little data such as around x = 10, the variance becomes a polynomial generated from a constant 0.23. This is a bad model design.

Development into Gaussian process

In the previous chapter, the function f (x) was represented by a polynomial model and its parameters were inferred. However, the data is not represented well and there is a good chance that the model is wrong. Therefore, we use the method of inferring the function directly. That is, in the previous chapter, p(θ|x)Parameter estimation was performed, but in this chapter, p(f|x)Infer. Gaussian process as a method of inferring this function(GP)To use.

The official definition of GP is as follows: All points in continuous space are associated with variables that follow a normal distribution, and GP is the joint distribution of their infinitely large number of random variables.

But I'm not going to have a particularly difficult discussion here. There are three important things:

The prior distribution is a multivariate normal distribution. f (x) ~ MvNormal (u | K)
The covariance matrix (K) is the kernel.
Covariance matrix parameters
The posterior distribution can be calculated analytically.

Imagine the kernel of 2 as a kernel used in support vector machines and so on. There are various kernels, and any kernel can be used as a covariance matrix. This article uses the Gaussian kernel. Let's take a look at the prior distribution of GP obtained from 1 and 2. The following program plots a multivariate normal distribution (mean = 0, covariance matrix = Gaussian kernel). Looking at the figure, you can see that it represents various functions. There are an infinite number of such expressions.

Note that if you use a different kernel, the expressions such as jagged functions will change. However, this time, when considering regression of the sin function, it can be seen that the expression shown in the figure below is appropriate.

def squareDistance(x, y):
  return np.array([[(x[i]-y[j])**2 for i in range(len(x))] for j in range(len(y))])

np.random.seed(1)
tp = np.linspace(0, 10, 100)

cov = np.exp(-squareDistance(tp, tp))
plt.plot(tp, stats.multivariate_normal.rvs(cov=cov, size=6).T

The multivariate normal distribution as shown in the above figure is expressed in various ways by changing the covariance matrix. In order to estimate the covariance matrix in the Bayesian framework, it is necessary to set the parameters to be estimated as shown in 3. So, the covariance matrix

K_{i,j} = \eta \exp(-\rho D)  \ \  (i \neq j) \\
K_{i,j} = \eta + \sigma \ \ (i = j)

will do.

Finally, there are four analytical posterior distribution calculations. If the likelihood is a normal distribution, the GP posterior distribution can solve the problem analytically, so the multivariate normal distribution, which is the GP posterior distribution, can be calculated with a simple formula. The posterior distribution makes it possible to predict the regression function.

A detailed explanation of the formula can be found in [Gaussian Process and Machine Learning] Gaussian Process Regression.

The prediction results and a part of the program are shown below. The result predicted from the estimated posterior distribution of GP is the red line, the green line is the sin function, the blue point is the data used for learning, and the blue line is the result of the least squares method. As a result of prediction by GP, the red line is closer to the green line than the blue line, so it can be seen that a regression model with good performance can be estimated directly without adjusting the parameters. Also, looking at the degree of dispersion of the red lines, it can be seen that the lines are scattered, that is, the dispersion is large in places where there is no data such as around 10.

#GP prior distribution: f(x) ~ GP(u=(u1, u2, ...), k(x, x')), u1, u2 .. =Assuming 0
#Likelihood: p(y|x, f(x)) ~ N(f, σ^2I)
#GP posterior distribution p(f(x)|x,y) ~GP(U,∑)

def squareDistance(x, y):
  return np.array([[(x[i]-y[j])**2 for i in range(len(x))] for j in range(len(y))])

points = np.linspace(xmin, xmax, xsize) #Reference point

with pm.Model() as model:
  #Prior distribution
  mu = np.zeros(xsize)
  eta = pm.HalfCauchy('eta', 5)
  rho = pm.HalfCauchy('rho', 5)
  sigma = pm.HalfCauchy('sigma', 5)

  D = squareDistance(x, x)
  K = tt.fill_diagonal(eta * pm.math.exp(-rho * D), eta + sigma) #i=eta at position j+Store sigma

  gpPri = pm.MvNormal('obs', mu, cov=K, observed=y)

  #Calculation of posterior distribution(The posterior distribution of the Gaussian distribution can be calculated by the formula)
  K_ss = eta * pm.math.exp(-rho * squareDistance(points, points))
  K_s = eta * pm.math.exp(-rho * squareDistance(x, points))

  MuPost = pm.Deterministic('muPost', pm.math.dot(pm.math.dot(K_s, tt.nlinalg.matrix_inverse(K)), y)) #K_s.T inv(K) y
  SigmaPost = pm.Deterministic('sigmaPost', K_ss - pm.math.dot(pm.math.dot(K_s, tt.nlinalg.matrix_inverse(K)), K_s.T))

  step = pm.Metropolis()
  trace = pm.sample(10000,step=step)

Summary

In this article, we introduced Bayesian statistics because the expression of uncertainty is important. However, when creating a regression model of the sin function, it was difficult to design a model that expresses the sin function well, so we introduced the Gaussian process. I will avoid detailed explanations and explain the concept. For details, please refer to other articles and books in the article.

[PYTHON] Regression using Gaussian process

Overview

Polynomial regression of Sin function

Bayesian polynomial regression

Development into Gaussian process

Summary