Overview

With a Bayesian optimization library, In fact, I tried to see if it could be optimized for a good feeling.

The library used is "GPyOpt".

The following books were used as references for the theory. "Gauss process and machine learning" (Author: Daichi Mochihashi, Seisei Ohba)

Experiment outline

Create an objective function and find its minimum value (within the specified definition area) by Bayesian optimization. The detailed settings are as follows. --Bayesian Optimization Library: GPyOpt (.methods.BayesianOptimization) --Objective function: 　　　y=f(x)=(x-300)(x-200)(x-15)(x-5)(x+6)(x+10)(x+100) --Optimization policy: Minimization --Domain (x): [-100, 300] --Acquisition function: EI (expected improvement value) --Initial data (x, y): 　　　x = -50 , 0 , 50 , 100 , 150 , 200 , 250 --Kernel: Not specified. (The default settings of the library have not been investigated.) This time I'll leave the kernel aside and see the results.

Premise

--Bayesian optimization A method of finding the minimum value (maximum value) of an objective function using Gaussian process regression. For the prediction (probability distribution) obtained from Gaussian process regression The acquisition function is applied, and the point where the function value becomes the maximum is searched as the next point.

--Gaussian process regression A type of Bayesian inference. The objective function value $ f (x) $ for each input point ($ x $) is regarded as a random variable one by one. Consider $ (f (x_ {1}), f (x_ {2}),…, f (x_ {n}),…) $ as the multivariate normal distribution $ N (μ, Σ) $.

Acquired input points 　x_{1} , x_{2} , … , x_{n} Then, the objective function value (normal distribution) of the new input point $ x_ {n + 1} $ is set. 　f(x_{n+1})　〜　p(　f(x_{n+1})　|　x_{n+1}　,　f(x_{1}) , f(x_{2}) , … 　,f(x_{n})　) Regression method to be regarded as. 　 It is sometimes used with the expected value of the objective function value (normal distribution) as the predicted value. (The above function $ f $ is called "Gaussian process".)

--Kernel In Gaussian process regression A function for specifying each component of the variance-covariance matrix $ Σ $ of $ (f (x_ {1}), f (x_ {2}),…, f (x_ {n}),…) $. The $ (i, j) $ component, that is, the covariance of $ f (x_ {i}) $ and $ f (x_ {j}) $ depends on $ x_ {i} $ and $ x_ {j} $ Defined by the function $ k (x_ {i}, x_ {j}) $ The function $ k $ is called the "kernel". There are several commonly used kernels, As a basic property "If the input points $ x_ {i}, x_ {j} $ are close, then $ f (x_ {i}), f (x_ {j}) $ are also close."

Experiment details

1. Library import

#[document] https://gpyopt.readthedocs.io/en/latest/index.html
#[Source] https://github.com/SheffieldML/GPyOpt/blob/master/GPyOpt/methods/bayesian_optimization.py

#pip install Gpyopt
import GPyOpt

import matplotlib.pyplot as plt
import numpy as np

2. Objective function

#Objective function definition
def f(x):
    y = (x-300)*(x-200)*(x-15)*(x-5)*(x+6)*(x+10)*(x+100)
    return y

#Domain definition
xlim_fr = -100
xlim_to = 300

#Graph
x = [i for i in range(xlim_fr , xlim_to + 1)]
y = [f(_x) for _x in x]

figsize = (10 , 5)
fig , ax = plt.subplots(1 , 1 , figsize=figsize)
ax.set_title('Outline of f')
ax.grid()
ax.plot(x , y)
ax.set_xlim(xlim_fr , xlim_to)
plt.show()

From the graph, within the definition area, It seems that x has a minimum value around 270. See if Bayesian optimization gives you that point.

3. Initial data

#Initial data
init_X = [i for i in range(-50 , 300 , 50)]
init_X_np = np.array(init_X).reshape((len(init_X) , 1))

init_Y = [f(i) for i in init_X]
init_Y_np = np.array(init_Y).reshape((len(init_Y) , 1))

print(len(init_X))
print(len(init_Y))

print(init_X_np[:5])
print(init_Y_np[:5])

#Plot the position of the initial data
figsize = (10 , 5)
fig , ax = plt.subplots(1 , 1 , figsize=figsize)
ax.set_title('Outline of f and Initial Data')
ax.grid()
ax.plot(x , y , label="f" , color="y")

#Initial data
for init_x , init_y in zip(init_X , init_Y):
    ax.plot(init_x , init_y , marker="o" , color="r")

ax.set_xlim(xlim_fr , xlim_to)
plt.show()

The red dot is given as the initial data.

The flow of optimization is as follows.

(1) Calculate the prediction (probability distribution) of the f value in the definition area from the acquired data. (2) Calculate the acquisition function value from each prediction. (3) Acquire the point (x) where the acquired function value is maximized as the next point, and search for it.

4. Bayesian optimization_model initialization

#Domain
bounds = [{'name': 'x', 'type': 'continuous', 'domain': (xlim_fr,xlim_to)}]

# X , Y :Initial data
# initial_design_numdata :The number of initial data to set. X above,No setting is required when Y is specified.
# normalize_Y :Objective function(Gaussian process)True to standardize.(This time False to make it easier to compare the forecast with the true value)
myBopt = GPyOpt.methods.BayesianOptimization(f=f
                                             , domain=bounds
                                             , X=init_X_np
                                             , Y=init_Y_np
                                             , normalize_Y=False
                                             , maximize=False
                                             #, initial_design_numdata=50
                                             , acquisition_type='EI')

5. Bayesian optimization_train

myBopt.run_optimization(max_iter=10)

6. Bayesian optimization_learning results / processes

#Optimal solution obtained
x_opt = myBopt.x_opt
fx_opt = myBopt.fx_opt
print("x_opt" , ":" , x_opt)
print("fx_opt" , ":" , fx_opt)

#Optimization trajectory
print("X.shape" , ":" , myBopt.X.shape)
print("Y.shape" , ":" , myBopt.Y.shape)

print("-" * 50)
print("X[:5]" , ":")
print(myBopt.X[:5])
print("-" * 50)
print("Y[:5]" , ":")
print(myBopt.Y[:5])

7. Forecast_Graphing

#Gaussian process regression model
model = myBopt.model.model

#Prediction (first component: mean, second component: std)
np_x = np.array(x).reshape(len(x) , 1)
pred = model.predict(np_x)

model.plot()
myBopt.plot_acquisition()

model.plot()
plt.plot(x , y , label="f" , color="y")
plt.plot(x_opt , fx_opt , label="Optimal solution" , marker="o" , color="r")
plt.xlim(xlim_fr , xlim_to)
plt.legend()
plt.grid()
plt.title("f of True & Predict")
plt.show()

The red dot in the third graph is the optimal solution obtained. Looking at the graph, the minimum value was found properly.

Summary

(Aside from detailed errors) Bayesian optimization was able to actually find the minimum value.

However, this time as initial data, It has a value close to the optimum solution (minimum value) of $ x = 250 $. That may have made it easier to optimize.

If the initial data is "only data away from the optimal solution" You may need to increase the number of trials.

[PYTHON] I tried Bayesian optimization!