Applying Bayesian optimization to Keras DNN model

Not limited to deep learning, adjusting parameters is a difficult task when creating machine learning models. The accuracy of machine learning models depends on the parameters, but the number and range of parameters are wide, and it is difficult to find the optimal solution.

For example, when considering a neural network model for hidden layer 3, there are many factors that must be determined, such as the number of neurons and dropout rate that are the output of each layer, batch size, and the number of epochs, and the range is wide. There are various ways to determine each parameter, but the following are examples of possible numerical values.

--Number of neurons: Integer greater than or equal to 1 --Dropout rate: Decimal number greater than or equal to 0 and less than 1 --Batch size: Integer less than or equal to the input data --Epoch number: Integer greater than or equal to 1

Depending on the neural network, there may be appropriate numerical values, but honestly parameter adjustment is troublesome because there are appropriate values depending on the data and the neural network model. Moreover, you cannot tell whether the parameter is good or bad until you try it.

Earlier, I introduced how to search for parameters with GridSearchCV. http://qiita.com/cvusk/items/285e2b02b0950537b65e

In this method, I tried all combinations of the parameter choices I set to find the best parameter. The drawback of this method is that as the number of parameters increases, the number of combinations to try increases by multiplication.

This time, we will introduce Bayesian optimization as a more powerful parameter adjustment method.

What to do this time

I will show you how to optimize a model written in Keras using Bayesian optimization. The model used as an example is MNIST.

I would like to optimize the following parameters in a three-layer model with one input layer, one hidden layer, and one output layer.

--Number of outputs in the input layer --Input layer dropout rate --Number of hidden layer outputs --Hidden layer dropout rate --Batch size --Number of epochs --Verification data rate

About Bayesian optimization

The explanation of Bayesian optimization is detailed here.

-Introduction to Bayesian Optimization -[Hyperparameter exploration of machine learning: Utilization of Bayesian optimization](http://www.techscore.com/blog/2016/12/20/%E6%A9%9F%E6%A2%B0%E5%AD % A6% E7% BF% 92% E3% 81% AE% E3% 83% 8F% E3% 82% A4% E3% 83% 91% E3% 83% BC% E3% 83% 91% E3% 83% A9 % E3% 83% A1% E3% 83% BC% E3% 82% BF% E6% 8E% A2% E7% B4% A2-% E3% 83% 99% E3% 82% A4% E3% 82% BA% E6% 9C% 80% E9% 81% A9 /) -Attempt to support medical image interpretation by Deep Learning and Bayesian Optimization

In Bayesian optimization, parameters are input, model verification accuracy (loss function, precision ratio, etc.) is output, and functions (models) in between are black boxes. The black box function is [Gaussian process](https://ja.wikipedia.org/wiki/%E3%82%AC%E3%82%A6%E3%82%B9%E9%81%8E%E7%A8% Assuming that 8B) is followed, the posterior distribution is searched and the parameters are optimized by repeating the verification.

tool

In Python, you can perform Bayesian optimization with a tool called GPyOpt. Click here for details on how to use it. Bayesian optimization package GPyOpt with Python

Code and usage

The code I wrote this time is here. https://github.com/shibuiwilliam/keras_gpyopt

I will explain the contents. First, define the MNIST model.

# Import libraries

import GPy, GPyOpt
import numpy as np
import pandas as pds
import random
from keras.layers import Activation, Dropout, BatchNormalization, Dense
from keras.models import Sequential
from keras.datasets import mnist
from keras.metrics import categorical_crossentropy
from keras.utils import np_utils
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping

# MNIST class
class MNIST():
    def __init__(self, first_input=784, last_output=10,
                 l1_out=512, 
                 l2_out=512, 
                 l1_drop=0.2, 
                 l2_drop=0.2, 
                 batch_size=100, 
                 epochs=10, 
                 validation_split=0.1):
        self.__first_input = first_input
        self.__last_output = last_output
        self.l1_out = l1_out
        self.l2_out = l2_out
        self.l1_drop = l1_drop
        self.l2_drop = l2_drop
        self.batch_size = batch_size
        self.epochs = epochs
        self.validation_split = validation_split
        self.__x_train, self.__x_test, self.__y_train, self.__y_test = self.mnist_data()
        self.__model = self.mnist_model()
        
    # load mnist data from keras dataset
    def mnist_data(self):
        (X_train, y_train), (X_test, y_test) = mnist.load_data()
        X_train = X_train.reshape(60000, 784)
        X_test = X_test.reshape(10000, 784)

        X_train = X_train.astype('float32')
        X_test = X_test.astype('float32')
        X_train /= 255
        X_test /= 255

        Y_train = np_utils.to_categorical(y_train, 10)
        Y_test = np_utils.to_categorical(y_test, 10)
        return X_train, X_test, Y_train, Y_test
    
    # mnist model
    def mnist_model(self):
        model = Sequential()
        model.add(Dense(self.l1_out, input_shape=(self.__first_input,)))
        model.add(Activation('relu'))
        model.add(Dropout(self.l1_drop))
        model.add(Dense(self.l2_out))
        model.add(Activation('relu'))
        model.add(Dropout(self.l2_drop))
        model.add(Dense(self.__last_output))
        model.add(Activation('softmax'))
        model.compile(loss='categorical_crossentropy',
                      optimizer=Adam(),
                      metrics=['accuracy'])

        return model
    
    # fit mnist model
    def mnist_fit(self):
        early_stopping = EarlyStopping(patience=0, verbose=1)
        
        self.__model.fit(self.__x_train, self.__y_train,
                       batch_size=self.batch_size,
                       epochs=self.epochs,
                       verbose=0,
                       validation_split=self.validation_split,
                       callbacks=[early_stopping])
    
    # evaluate mnist model
    def mnist_evaluate(self):
        self.mnist_fit()
        
        evaluation = self.__model.evaluate(self.__x_test, self.__y_test, batch_size=self.batch_size, verbose=0)
        return evaluation

# function to run mnist class
def run_mnist(first_input=784, last_output=10,
              l1_out=512, l2_out=512, 
              l1_drop=0.2, l2_drop=0.2, 
              batch_size=100, epochs=10, validation_split=0.1):
    
    _mnist = MNIST(first_input=first_input, last_output=last_output,
                   l1_out=l1_out, l2_out=l2_out, 
                   l1_drop=l1_drop, l2_drop=l2_drop, 
                   batch_size=batch_size, epochs=epochs, 
                   validation_split=validation_split)
    mnist_evaluation = _mnist.mnist_evaluate()
    return mnist_evaluation

Then Bayesian optimization is performed using the above MNIST model.


# Bayesian Optimization
#Define the choice and range of each parameter.
#Note: The parameter is type: continuous、type:Must be written in discrete order.
#Otherwise, an error will occur in the subsequent process.
bounds = [{'name': 'validation_split', 'type': 'continuous',  'domain': (0.0, 0.3)},
          {'name': 'l1_drop',          'type': 'continuous',  'domain': (0.0, 0.3)},
          {'name': 'l2_drop',          'type': 'continuous',  'domain': (0.0, 0.3)},
          {'name': 'l1_out',           'type': 'discrete',    'domain': (64, 128, 256, 512, 1024)},
          {'name': 'l2_out',           'type': 'discrete',    'domain': (64, 128, 256, 512, 1024)},
          {'name': 'batch_size',       'type': 'discrete',    'domain': (10, 100, 500)},
          {'name': 'epochs',           'type': 'discrete',    'domain': (5, 10, 20)}]

#Define a function for Bayesian optimization (the black box described above).
#x is the input and the output is returned.
def f(x):
    print(x)
    evaluation = run_mnist(
        l1_drop = float(x[:,1]), 
        l2_drop = float(x[:,2]), 
        l1_out = int(x[:,3]),
        l2_out = int(x[:,4]), 
        batch_size = int(x[:,5]), 
        epochs = int(x[:,6]), 
        validation_split = float(x[:,0]))
    print("loss:{0} \t\t accuracy:{1}".format(evaluation[0], evaluation[1]))
    print(evaluation)
    return evaluation[0]

#Perform a preliminary search.
opt_mnist = GPyOpt.methods.BayesianOptimization(f=f, domain=bounds)

#Search for the best parameter.
opt_mnist.run_optimization(max_iter=10)
print("optimized parameters: {0}".format(opt_mnist.x_opt))
print("optimized loss: {0}".format(opt_mnist.fx_opt))

GPyOpt.methods.BayesianOptimization defines methods for Bayesian optimization. With this, you can search for the parameter you have defined, the parameter that gives the optimum loss function within the bounds, but there is a caveat. Parameter choices, the range is written with dict, and type: continuous and type: discrete define continuous value or choice. If you do not write in the order of continuous and discrete here, an error will occur with ʻopt_mnist = GPyOpt.methods.BayesianOptimization (f = f, domain = bounds) `. I didn't know how to write this, and I was addicted to it for about two days.

Find the best parameter with ʻopt_mnist.run_optimization (max_iter = 10) . Specify the upper limit of the number of learning executions for search with max_iter`. We will search up to 10 times here, but if it converges early, it will be completed in a smaller number of times.

The execution result of the program is as follows.

Here you can see that the search is completed in 4 times. Bayesian optimization can be used to automate parameter tuning and reduce effort.

[PYTHON] Applying Bayesian optimization to Keras DNN model

Applying Bayesian optimization to Keras DNN model

What to do this time

About Bayesian optimization

tool

Code and usage