[PYTHON] [Translation] hyperopt tutorial

Python library hyperopt for optimizing annoying search spaces in real, discrete, and conditional dimensions tutorial ([wiki: FMin rev: a663e]](https://github.com/hyperopt/hyperopt/wiki/FMin/a663e64546eb5cd3ed462618dcc1e41863ad8688)) was translated by google. License


This page is a tutorial on the basic usage of hyperopt.fmin (). Describes how to write an objective function that fmin can optimize and how to write a search space that fmin can search.

Hyperopt's job is to find the best possible stochastic function value for a scalar value rather than the set of possible arguments for that function. While many optimization packages expect these inputs to be derived from vector spaces, Hyperopt encourages you to describe your search space in more detail. By providing more information about where your function is defined and where the optimal value is, hyperopt's algorithms can be searched more efficiently.

The way to use hyperopt is to write:

This (most basic) tutorial will show you how to create functions and search spaces using the default trial database and dummy random search algorithm. Section (1) is about different calling conventions for communication between the objective function and hyperopt. Section (2) is about the description of the search space.

You can do a parallel search by replacing the Trials database with the MongoTrials database. There is another wiki page about using mongodb for parallel search.

Choosing a search algorithm is as easy as passing ʻalgo = hyperopt.tpe.suggest instead of ʻalgo = hyperopt.random.suggest. The search algorithm is actually a callable object, and its constructor accepts configuration arguments, which is all about how the search algorithm is selected.

1. Definition of function to minimize

Hyperopt offers several levels of increased flexibility and complexity when specifying to minimize the objective function. Questions to think about as a designer

In the next few sections, we'll look at different ways to implement an objective function that minimizes the quadratic objective function for a single variable. In each section, search in the range -10 to +10. This can be described in * search space *.

space = hp.uniform('x', -10, 10)

Below, Section 2, covers how to specify search spaces that are more complicated.

1.1 The simplest case

The simplest protocol for communication between a hyperopt optimization algorithm and an objective function is for the objective function to receive a valid point from the search space and use the floating point * loss * (also known as the negation utility) associated with that point. return.

from hyperopt import fmin, tpe, hp
best = fmin(fn=lambda x: x ** 2,
    space=hp.uniform('x', -10, 10),
    algo=tpe.suggest,
    max_evals=100)
print best

This protocol has the advantage of being very readable and easy to type. As you can see, it's almost one liner. The drawback of this protocol is (1) This type of function does not allow additional information about each assessment to be returned to the test database. And (2) This kind of function cannot interact with the search algorithm or other parallel function evaluation. The following example shows why you want to do these things.

1.2 Attachment of additional information by trial object

If the objective function is complex and takes a long time to execute, you may want to save more statistics and diagnostic information, as well as the last floating point loss. In such cases, the fmin function can treat the dictionary as a return value. That is, your loss function can return a dictionary that nests all the statistics and diagnostics you want. The reality is a little less flexible than this. For example, when using mongodb, the dictionary must be a valid JSON document. Still, there is plenty of flexibility to store domain-specific auxiliary results.

When the objective function returns a dictionary, the fmin function looks for some special key-value pair in the return value and passes it to the optimization algorithm. There are two required key-value pairs.

The fmin function also responds to some option keys:

The dictionary uses a variety of back-end storage mechanisms, so you need to make sure it's compatible with JSON. If it is a graph with a tree structure of dictionary, list, tuple, number, string, date and time, there is no problem.

** Hint: ** To store numpy arrays, consider serializing them into strings and saving them as attachments.

Writing the above function in a dictionary-returning style would look like this:

import pickle
import time
from hyperopt import fmin, tpe, hp, STATUS_OK

def objective(x):
    return {'loss': x ** 2, 'status': STATUS_OK }

best = fmin(objective,
    space=hp.uniform('x', -10, 10),
    algo=tpe.suggest,
    max_evals=100)

print best

1.3 Trial object

To actually see the purpose of returning a dictionary, modify the objective function to return some and pass an explicit trials argument to fmin.

import pickle
import time
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

def objective(x):
    return {
        'loss': x ** 2,
        'status': STATUS_OK,
        # -- store other results like this
        'eval_time': time.time(),
        'other_stuff': {'type': None, 'value': [0, 1, 2]},
        # -- attachments are handled differently
        'attachments':
            {'time_module': pickle.dumps(time.time)}
        }
trials = Trials()
best = fmin(objective,
    space=hp.uniform('x', -10, 10),
    algo=tpe.suggest,
    max_evals=100,
    trials=trials)

print best

In this case, the call to fmin is the same as before, but you can pass the trial object directly to check all the return values calculated during the experiment.

So, for example:

You can save this trial object, pass it to a built-in plotting routine, or analyze it with your own custom code.

ʻAttachments is handled by a special mechanism that allows the same code to be used for both TrialsandMongoTrials`.

You can get a trial attachment like this. This will get the'time_module'attachment for the 5th trial.

msg = trials.trial_attachments(trials.trials[5])['time_module']
time_module = pickle.loads(msg)

attachments are large strings, so if you use MongoTrials you don't need to download more than you need. Strings can also be attached globally to the entire trial object via trials. attachments behave like a string-to-string dictionary.

** N.B. ** Currently, trial-specific attachments to Trials objects are placed in the same global trial attachment dictionary, but are subject to change in the future and do not apply to MongoTrials.

1.4 Ctrl object for real-time communication with MongoDB

It is possible for fmin () to give your objective function the handle of mongodb used in parallel experiments. This mechanism allows you to update the database with partial results and communicate with other parallel processes that are evaluating different points. The objective function can even add new search points, such as random.suggest.

The basic techniques are:

I won't cover it in this short tutorial, but I would like to make some mention of what is possible with the current codebase. It also includes hyperopt sources, unit tests, and sample projects such as hyperopt-convnet. Please email me or submit a github issue to speed up this part of the code.

2. Definition of search space

The search space consists of nested functional expressions that contain stochastic expressions. Stochastic expressions are hyperparameters. Sampling from this nested stochastic program defines a random search algorithm. Hyperparameter optimization algorithms work by replacing the usual "sampling" logic with adaptive search strategies and do not actually attempt to sample from a specified distribution in the search space.

It's best to think of search space as a stochastic argument sampling program. For example

from hyperopt import hp
space = hp.choice('a',
    [
        ('case 1', 1 + hp.lognormal('c1', 0, 1)),
        ('case 2', hp.uniform('c2', -10, 10))
    ])

The result of running this code is the variable space, which references the graph of the expression identifier and its arguments. Nothing was actually sampled. It's just a graph that describes how to sample points. The code for working with this kind of representation graph is in hyperopt.pyll, and we call these graphs pyll graphs or * pyll programs *.

If desired, the sample space can be sampled and evaluated.

import hyperopt.pyll.stochastic
print hyperopt.pyll.stochastic.sample(space)

This search space, described by space, has three parameters:

*'a'-Select a case *'c1'-Positive value parameter used in'case 1' *'c2'-Bounded real-valued parameters used in'case 2'

One thing to note here is that all optimizable stochastic expressions have a * label * as the first argument. These labels are used to return parameter choices to the caller and are used internally in a variety of ways.

Another thing to note is the use of tuples in the center of the graph (around each of'case 1'and'case 2'). Lists, dictionaries, and tuples are all upgraded to "deterministic functional expressions" and become part of the search space stochastic program.

The third notable is the numeric expression 1 + hp.lognormal ('c1', 0, 1) embedded in the search space description. As far as the optimization algorithm is concerned, there is no difference in adding 1 directly to the search space and 1 in the logic of the objective function itself. Designers can choose where to place such processing to achieve the kind of modularity they need. The result of an intermediate expression in the search space can be any Python object, even when optimizing in parallel using mongodb. It's easy to add a new type of non-stochastic representation to the search space description (see Section 2.3 below).

Fourth,'c1'and'c2' are examples called conditional parameters. Each of'c1'and'c2' shows only the numbers in the sample returned for a particular value of'a'. If'a'is 0,'c1' is used but'c2' is not used. If'a'is 1,'c2' is used but'c1' is not used. When it makes sense, you should encode the parameters as conditional in this way, rather than simply ignoring them in the objective function. You can search more efficiently if you find that'c1'may not affect the objective function (because it does not affect the objective function arguments).

2.1 Parameter expression

The probabilistic formulas currently recognized by the hyperopt optimization algorithm are:

2.2 A Search Space Example: scikit-learn

To see all these possibilities in action, let's see how scikit-learn describes the hyperparameter spaces of the classification algorithm. (This idea was developed at hyperopt-sklearn.)

from hyperopt import hp
space = hp.choice('classifier_type', [
    {
        'type': 'naive_bayes',
    },
    {
        'type': 'svm',
        'C': hp.lognormal('svm_C', 0, 1),
        'kernel': hp.choice('svm_kernel', [
            {'ktype': 'linear'},
            {'ktype': 'RBF', 'width': hp.lognormal('svm_rbf_width', 0, 1)},
            ]),
    },
    {
        'type': 'dtree',
        'criterion': hp.choice('dtree_criterion', ['gini', 'entropy']),
        'max_depth': hp.choice('dtree_max_depth',
            [None, hp.qlognormal('dtree_max_depth_int', 3, 1, 1)]),
        'min_samples_split': hp.qlognormal('dtree_min_samples_split', 2, 1, 1),
    },
    ])

2.3 Addition of non-stochastic expressions using pyll

You can use nodes like arguments to the pyll function (see pyll). If you want to know more about this, please submit a github issue.

Simply put, it just decorates the top-level (that is, pickle-friendly) functions for use through the scope object.

import hyperopt.pyll
from hyperopt.pyll import scope

@scope.define
def foo(a, b=0):
     print 'runing foo', a, b
     return a + b / 2

# -- this will print 0, foo is called as usual.
print foo(0)

#In the description of the search space, like normal Python`foo`Can be used.
#These two calls don't actually call foo,
#Only record that you need to call foo to evaluate the graph.

space1 = scope.foo(hp.uniform('a', 0, 10))
space2 = scope.foo(hp.uniform('a', 0, 10), hp.normal('b', 0, 1))

# -- this will print an pyll.Apply node
print space1

# -- this will draw a sample by running foo()
print hyperopt.pyll.stochastic.sample(space1)

2.4 Add new kind of hyperparameters

If possible, we should avoid adding new kinds of stochastic representations to describe the parameter search space. In order for all search algorithms to work in all spaces, the search algorithms must match the type of hyperparameters that describe the spaces. As a library maintainer, I open up the possibility that some kind of expression should be added from time to time, but as I said, I want to avoid it as much as possible. Adding a new kind of stochastic representation is not one of the ways hyperopt is extensible.


Copyright (c) 2013, James Bergstra All rights reserved.

Recommended Posts

[Translation] hyperopt tutorial
streamlit tutorial Japanese translation
[Translation] scikit-learn 0.18 Tutorial Text data manipulation
Biopython Tutorial and Cookbook Japanese translation (4.3)
[Translation] scikit-learn 0.18 Tutorial Table of Contents
Biopython Tutorial and Cookbook Japanese translation (4.1)
Biopython Tutorial and Cookbook Japanese translation (4.8)
Biopython Tutorial and Cookbook Japanese translation (4.7)
Biopython Tutorial and Cookbook Japanese translation (4.9)
Biopython Tutorial and Cookbook Japanese translation (4.6)
Biopython Tutorial and Cookbook Japanese translation (4.2)
Biopython Tutorial and Cookbook Japanese translation (4.4)
sqlalchemy tutorial
PyODE Tutorial 2
Python tutorial
PyODE Tutorial 1
PyODE Tutorial 3
Biopython Tutorial and Cookbook Japanese translation (Chapter 1, 2)
TensorFlow tutorial tutorial