[PYTHON] Introduction of scikit-optimize

This article is the 17th day article of Machine Learning Advent Calendar 2016.

This time, I would like to introduce a library called scikit-optimize that can estimate the parameters that minimize the black box function.

Installation

The environment tested this time is as follows.

Installation is easy from pip.

pip install scikit-optimize

Example

Getting Started in the README.md on github is a function with noise added. It would be nice to know the function, but in reality the function may be unknown. In such a case, if the function does not know the data point x but it can be evaluated, the x that can be minimized can be obtained by using a method called Bayesian Optimization.

import numpy as np
from skopt import gp_minimize

def f(x):
    return (np.sin(5 * x[0]) * (1 - np.tanh(x[0] ** 2)) * np.random.randn() * 0.1)

res = gp_minimize(f, [(-2.0, 2.0)])

This res has the following variables.

fun = min(func_vals)

For Machine Learning

Machine learning (especially supervised learning) aims to build models from datasets and improve predictive performance for unknown data. At that time, machine learning evaluates the model using cross-validation and various evaluation indexes. Furthermore, tuning hyperparameters is indispensable if you want to build a higher performance model. This time, I will try tuning this hyperparameter using skopt.

Preparation

Determine the model of data and machine learning. This page also has an example, but since it's a big deal, I'll try a slightly different model.

procedure

  1. Prepare the data and model.
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

data = load_breast_cancer()
X, y = data.data, data.target
n_features = len(X)
model = GradientBoostingClassifier
  1. Define a black box function.
def objective(params):
    max_depth, lr, max_features, min_samples_split, min_samples_leaf = params
    
    model.set_params(max_depth=max_depth,
                     max_features=max_features,
                     learning_rate=lr,
                     min_samples_split=min_samples_split,
                     min_samples_leaf=min_samples_leaf)
    
    # gp_Since minimize can only be minimized, it is necessary to use a negative value for an index that indicates that the higher the performance, the higher the performance.
    return -np.mean(cross_val_score(model, X, y, cv=5, scoring='roc_auc'))
  1. Determine the parameter search range (Space).
space  = [(1, 5), (10**-5, 10**-1, "log-uniform"), (1, n_features), (2, 30), (1, 30)]
  1. Determine the initial value of the search.
x0 = [3, 0.01, 6, 2, 1]
  1. Use gp_minimize to estimate the hyperparameters to minimize.
res = gp_minimize(objective, space, x0=x0, n_calls=50)

print(res.fun) # -0.993707074488
print(res.x)   # [5, 0.096319962593215167, 1, 30, 22]

In this way, we were able to find the optimal hyperparameters. By the way, with this dataset, the time required for gp_minimize was 17 [s].

Others

The official website has some samples other than those described above.

Recommended Posts

Introduction of scikit-optimize
Introduction of PyGMT
Introduction of cymel
Introduction of Python
Introduction of trac (Windows + trac 1.0.10)
Introduction of ferenOS 1 (installation)
Introduction of Virtualenv wrapper
[Introduction to cx_Oracle] Overview of cx_Oracle
Introduction
Introduction of activities applying Python
Introduction of caffe using pyenv
Introduction and tips of mlflow.Tracking
Introduction and Implementation of JoCoR-Loss (CVPR2020)
Introduction and implementation of activation function
Introduction of data-driven controller design method
Introduction of pipenv (also create requirements.txt)
Introduction of ferenOS 3 (package update, installation)
Introduction of python drawing package pygal
Record of Python introduction for newcomers
A little niche feature introduction of faiss
kivy introduction
General Theory of Relativity in Python: Introduction
Easy introduction of speech recognition with Python
[EDA] Introduction of Sweetviz (comparison with + pandas-profiling)
Introduction of Go's RDB access library (go-pg/pg)
Complete everything with Jupyter ~ Introduction of nbdev ~
Easy introduction of python3 series and OpenCV3
Introduction to Scapy ① (From installation to execution of Scapy)
[Introduction to Data Scientists] Basics of Python ♬
Introduction of SoftLayer Command Line Interface environment
[Introduction to cx_Oracle] (16th) Handling of LOB types
Introduction of new voice feature extraction library Surfboard
[Introduction to Udemy Python 3 + Application] 26. Copy of dictionary
[Introduction to Udemy Python 3 + Application] 19. Copy of list
[Introduction to cx_Oracle] (Part 3) Basics of Table Reference
Python & Machine Learning Study Memo ②: Introduction of Library
Introduction of Python Imaging Library (PIL) using HomeBrew
Introduction of ferenOS 2 (settings after installation, Japanese input settings)
Kyoto University Python Lecture Material: Introduction of Columns
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
[Introduction to cx_Oracle] (5th) Handling of Japanese data
[Introduction to cx_Oracle] (Part 7) Handling of bind variables
From the introduction of pyethapp to the execution of contract
[Introduction to Python] Basic usage of lambda expressions