[PYTHON] Gaussian process regression using GPy

What is Gaussian process regression?

Based on the output `y``` corresponding to the input` `x``` that has already been sampled Create a regression model that returns the expected value and variance of the output prediction value y'``` for the new input `` x'```. It is used to predict the true shape of a function from a limited number of sample points. https://jp.mathworks.com/help/stats/gaussian-process-regression-models.html

To work with Gaussian process regression models, we use a python library called GPy. https://gpy.readthedocs.io/en/deploy/#

Drawing a true function

Let the inputs be two-dimensional, and the true function is the sum of the values passed through the cosine function.

temp.py


import numpy as np

#Function definition
def func(x):
    fx = np.sum(np.cos(2 * np.pi * x))
    return fx

xa = np.linspace(-1, 1, 101)
ya = np.linspace(-1, 1, 101)
Xa, Ya = np.meshgrid(xa, ya)

Za = np.zeros([101, 101])
for i in range(len(Xa)):
    for j in range(len(Ya)):
        x = np.array([Xa[i,j], Ya[i,j]])
        Za[i,j] = func(x)

#drawing
import matplotlib.pyplot as plt
fig1 = plt.figure(figsize=(8,8))
ax1 = fig1.add_subplot(111)
ax1.contour(Xa, Ya, Za, cmap="jet", levels=10, alpha=1)
plt.xlim(-1,1)
plt.ylim(-1,1)

image.png

sampling

Sobol sequence, Latin hypercube sampling, etc. are available as methods for sampling without waste. We do not use them, and here we simply randomly determine the sample points. The number of sample points should be 20 at the beginning.

temp.py


import random
random.seed(1)

#Random sampling
n_sample = 20
Xa_rand = [random.random()* 2 - 1 for i in range(n_sample)]
Ya_rand = [random.random()* 2 - 1 for i in range(n_sample)]

xlist = np.stack([Xa_rand, Ya_rand], axis=1)
Za_rand = []
for x in xlist:
    Za_rand = np.append(Za_rand, func(x))

#drawing
ax1.scatter(Xa_rand, Ya_rand)

Plot the sample points in the previous figure. The lower half is still better, but the upper half has few samples and is squishy. image.png

Gaussian process regression

Build a Gaussian process regression model.

GPy.Select a kernel function with kern. Here, it is a two-dimensional RBF kernel.



#### **`GPy.models.Build a regression model with GPRegression and model.Tune the model parameters with optimize.`**

temp.py


import GPy

#Training data
Input = np.stack([Xa_rand, Ya_rand], axis=1)
Output = Za_rand[:,None]

#Build Gaussian process regression model
kernel = GPy.kern.RBF(2)
model = GPy.models.GPRegression(Input, Output, kernel)
model.optimize(messages=True, max_iters=1e5)

#drawing
model.plot(levels=10)
plt.gcf().set_size_inches(8, 8, forward=True)
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.xlabel("x1")
plt.ylabel("x2")

Plot the response surface. Although the score was quite low at 20 points, the rough mountains and valleys were surprisingly reproduced. The error in the upper half is large. image.png

Confidence interval drawing

Fix one of the two-dimensional inputs to 0 and look at the cross section of the response surface.

temp.py


# x2=0 cross section
model.plot(fixed_inputs=[(1, 0)])
plt.xlim(-1,1)
plt.ylim(-4,4)
plt.xlabel("x1")

# x1=0 cross section
model.plot(fixed_inputs=[(0, 0)])
plt.xlim(-1,1)
plt.ylim(-4,4)
plt.xlabel("x2")

x2 = 0 cross section image.png

x1 = 0 cross section image.png

The light blue band indicates the 2.5-97.5% confidence interval. The wider the confidence interval, the greater the variability in the regression results. It seems that the big part of x2 is still not confident.

When the number of samplings is increased

n_sample = 40 image.png image.png image.png

n_sample = 100 image.png image.png image.png

As the number of samplings increases, the width of the confidence interval becomes narrower and the variation in the regression results becomes smaller.

Summary

A Gaussian process regression model was constructed using GPy.

Recommended Posts

Gaussian process regression using GPy
Regression using Gaussian process
Gaussian process regression Numpy implementation and GPy
Gaussian process
PRML Chapter 6 Gaussian Process Regression Python Implementation
Gaussian process regression with PyMC3 Personal notes
Gaussian process with pymc3
Linear regression method using Numpy
Process on GPU using chainer.cuda.elementwise
Implement Gaussian process in Pyro
"Gaussian process and machine learning" Gaussian process regression implemented only with Python numpy
DB table insertion process using sqlalchemy
[For beginners] Process monitoring using cron