[PYTHON] I tried Bayesian optimization!

Overview

With a Bayesian optimization library, In fact, I tried to see if it could be optimized for a good feeling.

The library used is "GPyOpt".

The following books were used as references for the theory. "Gauss process and machine learning" (Author: Daichi Mochihashi, Seisei Ohba)

Experiment outline

Create an objective function and find its minimum value (within the specified definition area) by Bayesian optimization. The detailed settings are as follows. --Bayesian Optimization Library: GPyOpt (.methods.BayesianOptimization) --Objective function:    y=f(x)=(x-300)(x-200)(x-15)(x-5)(x+6)(x+10)(x+100) --Optimization policy: Minimization --Domain (x): [-100, 300] --Acquisition function: EI (expected improvement value) --Initial data (x, y):    x = -50 , 0 , 50 , 100 , 150 , 200 , 250 --Kernel: Not specified. (The default settings of the library have not been investigated.) This time I'll leave the kernel aside and see the results.

Premise

--Bayesian optimization A method of finding the minimum value (maximum value) of an objective function using Gaussian process regression. For the prediction (probability distribution) obtained from Gaussian process regression The acquisition function is applied, and the point where the function value becomes the maximum is searched as the next point.

--Gaussian process regression A type of Bayesian inference. The objective function value $ f (x) $ for each input point ($ x $) is regarded as a random variable one by one. Consider $ (f (x_ {1}), f (x_ {2}),…, f (x_ {n}),…) $ as the multivariate normal distribution $ N (μ, Σ) $.

Acquired input points  x_{1} , x_{2} , … , x_{n} Then, the objective function value (normal distribution) of the new input point $ x_ {n + 1} $ is set.  f(x_{n+1}) 〜 p( f(x_{n+1}) | x_{n+1} , f(x_{1}) , f(x_{2}) , …  ,f(x_{n}) ) Regression method to be regarded as.   It is sometimes used with the expected value of the objective function value (normal distribution) as the predicted value. (The above function $ f $ is called "Gaussian process".)

--Kernel In Gaussian process regression A function for specifying each component of the variance-covariance matrix $ Σ $ of $ (f (x_ {1}), f (x_ {2}),…, f (x_ {n}),…) $. The $ (i, j) $ component, that is, the covariance of $ f (x_ {i}) $ and $ f (x_ {j}) $ depends on $ x_ {i} $ and $ x_ {j} $ Defined by the function $ k (x_ {i}, x_ {j}) $ The function $ k $ is called the "kernel". There are several commonly used kernels, As a basic property "If the input points $ x_ {i}, x_ {j} $ are close, then $ f (x_ {i}), f (x_ {j}) $ are also close."

Experiment details

1. Library import

#[document] https://gpyopt.readthedocs.io/en/latest/index.html
#[Source] https://github.com/SheffieldML/GPyOpt/blob/master/GPyOpt/methods/bayesian_optimization.py

#pip install Gpyopt
import GPyOpt

import matplotlib.pyplot as plt
import numpy as np

2. Objective function

#Objective function definition
def f(x):
    y = (x-300)*(x-200)*(x-15)*(x-5)*(x+6)*(x+10)*(x+100)
    return y

#Domain definition
xlim_fr = -100
xlim_to = 300

#Graph
x = [i for i in range(xlim_fr , xlim_to + 1)]
y = [f(_x) for _x in x]

figsize = (10 , 5)
fig , ax = plt.subplots(1 , 1 , figsize=figsize)
ax.set_title('Outline of f')
ax.grid()
ax.plot(x , y)
ax.set_xlim(xlim_fr , xlim_to)
plt.show()

From the graph, within the definition area, It seems that x has a minimum value around 270. See if Bayesian optimization gives you that point. obj_func.jpg

3. Initial data

#Initial data
init_X = [i for i in range(-50 , 300 , 50)]
init_X_np = np.array(init_X).reshape((len(init_X) , 1))

init_Y = [f(i) for i in init_X]
init_Y_np = np.array(init_Y).reshape((len(init_Y) , 1))

print(len(init_X))
print(len(init_Y))

print(init_X_np[:5])
print(init_Y_np[:5])

#Plot the position of the initial data
figsize = (10 , 5)
fig , ax = plt.subplots(1 , 1 , figsize=figsize)
ax.set_title('Outline of f and Initial Data')
ax.grid()
ax.plot(x , y , label="f" , color="y")

#Initial data
for init_x , init_y in zip(init_X , init_Y):
    ax.plot(init_x , init_y , marker="o" , color="r")

ax.set_xlim(xlim_fr , xlim_to)
plt.show()

The red dot is given as the initial data.

The flow of optimization is as follows.

(1) Calculate the prediction (probability distribution) of the f value in the definition area from the acquired data. (2) Calculate the acquisition function value from each prediction. (3) Acquire the point (x) where the acquired function value is maximized as the next point, and search for it. init_data_1.jpg init_data_2.jpg

4. Bayesian optimization_model initialization

#Domain
bounds = [{'name': 'x', 'type': 'continuous', 'domain': (xlim_fr,xlim_to)}]

# X , Y :Initial data
# initial_design_numdata :The number of initial data to set. X above,No setting is required when Y is specified.
# normalize_Y :Objective function(Gaussian process)True to standardize.(This time False to make it easier to compare the forecast with the true value)
myBopt = GPyOpt.methods.BayesianOptimization(f=f
                                             , domain=bounds
                                             , X=init_X_np
                                             , Y=init_Y_np
                                             , normalize_Y=False
                                             , maximize=False
                                             #, initial_design_numdata=50
                                             , acquisition_type='EI')

5. Bayesian optimization_train

myBopt.run_optimization(max_iter=10)

6. Bayesian optimization_learning results / processes

#Optimal solution obtained
x_opt = myBopt.x_opt
fx_opt = myBopt.fx_opt
print("x_opt" , ":" , x_opt)
print("fx_opt" , ":" , fx_opt)

#Optimization trajectory
print("X.shape" , ":" , myBopt.X.shape)
print("Y.shape" , ":" , myBopt.Y.shape)

print("-" * 50)
print("X[:5]" , ":")
print(myBopt.X[:5])
print("-" * 50)
print("Y[:5]" , ":")
print(myBopt.Y[:5])

opt_result_1.jpg opt_result_2.jpg

7. Forecast_Graphing

#Gaussian process regression model
model = myBopt.model.model

#Prediction (first component: mean, second component: std)
np_x = np.array(x).reshape(len(x) , 1)
pred = model.predict(np_x)

model.plot()
myBopt.plot_acquisition()

model.plot()
plt.plot(x , y , label="f" , color="y")
plt.plot(x_opt , fx_opt , label="Optimal solution" , marker="o" , color="r")
plt.xlim(xlim_fr , xlim_to)
plt.legend()
plt.grid()
plt.title("f of True & Predict")
plt.show()

The red dot in the third graph is the optimal solution obtained. Looking at the graph, the minimum value was found properly. predict_1.jpg predict_2.jpg predict_3.jpg

Summary

(Aside from detailed errors) Bayesian optimization was able to actually find the minimum value.

However, this time as initial data, It has a value close to the optimum solution (minimum value) of $ x = 250 $. That may have made it easier to optimize.

If the initial data is "only data away from the optimal solution" You may need to increase the number of trials.

Recommended Posts

I tried Bayesian optimization!
I tried using Bayesian Optimization in Python
I tried to step through Bayesian optimization. (With examples)
I tried scraping
I tried PyQ
I tried AutoKeras
I tried papermill
I tried django-slack
I tried spleeter
I tried cgo
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried competitive programming
I tried running pymc
I tried ARP spoofing
I tried using aiomysql
I tried using Summpy
I tried Python> autopep8
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried PyCaret2.0 (pycaret-nightly)
I tried using openpyxl
I tried deep learning
I tried AWS CDK!
I tried using Ipython
I tried to debug.
I tried using PyCaret
I tried using cron
I tried Kivy's mapview
I tried using ngrok
I tried using face_recognition
I tried to paste
I tried using PyCaret
I tried moving EfficientDet
I tried shell programming
I tried using Heapq
I tried using doctest
I tried running TensorFlow
I tried Auto Gluon
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
I tried to summarize four neural network optimization methods
I tried Value Iteration Networks
I tried fp-growth with python
I tried scraping with Python
I tried AutoGluon's Image Classification
I tried to learn PredNet
I tried Learning-to-Rank with Elasticsearch!
[I tried using Pythonista 3] Introduction
I tried to organize SVM.
I tried using Random Forest
I tried clustering with PyCaret
I tried using BigQuery ML
I tried "K-Fold Target Encoding"