Deep Python learned from DEAP

Introduction

Do you know the evolutionary computation framework called DEAP? e? Evolutionary Computation Before DEAP What is that state? Well, the prerequisite knowledge of evolutionary computation is not very relevant in this article. The main purpose of this article is to explain how well DEAP is using Python to implement evolutionary computation and to help improve Python power.

Evolutionary computation, especially about genetic algorithms

Even if you say OK without any prerequisite knowledge, you have to understand the words that appear after this, so I will briefly explain evolutionary computation, especially the genetic algorithm (GA) used for explanation.

GA is an algorithm inspired by the evolution of living things, as it is called "genetic". Why are the living things surviving? There are various reasons, but the following three are the main factors in GA.

selection
Weak creatures die and strong creatures survive. I'll explain what it means to be strong and weak later.
crossover
Surviving organisms are mated to leave offspring. The child inherits the genes of his parents.
mutation
Of course, the living things we have now are not from the time when the earth was created. It has evolved gradually, but it requires a child with a slightly different part besides inheriting the nature from the parent. When it accumulates, it becomes a completely different creature.

By the way, I wrote that strong creatures survive by selection, but "strong" means "high fitness". Next is "What is the goodness of fit?", But in GA it is the "value of the objective function you want to solve". It is $ f (\ vec {x}) $. Next is the story of $ \ vec {x} $.

Gene expression and manipulation

I finally came to the "heredity" of the genetic algorithm. "Heredity" is a gene. This gene is $ \ vec {x} $ when written mathematically. Gene representation is roughly divided into two, and when calculating $ f (\ vec {x}) $, it is expressed by a binary string of 0/1. It is a method called real value GA that uses the numerical value as it is as a "gene".

In ** Crossing **, I wrote that the genes of my parents are inherited, but I cut and paste the genes as follows to make a child. Where to cut is usually decided by a random number.

Father: 01101101 Mother: 11000011
Cut and paste in the middle (child 1 is the left half of the father + right half of the mother, child 1 is the left half of the mother + right half of the father)
Child 1: 01100011 Child 2: 110011101

** Mutation ** has a certain probability of inverting bits.

After crossing, child 1 mutation (left end changed from 0 to 1)
Child 1: 11100011

Whether the child created in this way survives depends on the goodness of fit (objective function value) of the child. This is ** selection **. This selection, crossover, and mutation go round and round. Then, the idea of GA is that the individual with high goodness of fit (high objective function value) survives and the desired $ \ vec {x} $ (design variable value) is obtained.

How to cross (use the crossing technique), mutate, and choose is also dependent on the nature of the problem. Of course, the nature of the target problem is often unknown, so you have to "try various things".

Well, it's been a long time, saying it's easy, but it's time to explain the moe points of DEAP.

Individual representation in DEAP

Let's take a look along the Overview. First is the individual expression. In the case of "individual expression", it is necessary to consider "how to implement the gene" and "how to implement the goodness of fit".

Implementation of fitness

In DEAP, the class that represents the goodness of fit is defined first. It's already interesting from here.

from deap import base, creator
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))

The Overview says, "This will create a FitnessMin class." No, wait, what does it mean to "create a class"? Isn't it a function call? Don't you think this? Next, take a look at creator.create reference (link is in English)

deap.creator.create(name, base[, attribute[, ...]]) Create a class named name that inherits base from the creator module. You can define the attributes specified by the keyword arguments after the third argument in that class. If the argument is a class, the object of the class passed as an argument will be created and set in the attribute when creating the object of the class to be created. If the argument is not a class, it will be set as a static attribute of the creation class.

The reference shows the class definition equivalent to writing creator.create for the Foo class, but the previouscreator.create ("FitnessMin", base.Fitness, weights = (-1.0,)) If you apply it in the same way, it will be as follows.

Will be executed within the creator module


class FitnessMin(base.Fitness):
    weights = (-1.0,)

I knew how it would work. But how are you doing? That's where the concept of ** Python is that classes are also first-class citizens ** [^ 1]. [WIkipedia First Class Citizens Section](https://en.wikipedia.org/wiki/%E7%AC%AC%E4%B8%80%E7%B4%9A%E3%82%AA%E3% 83% 96% E3% 82% B8% E3% 82% A7% E3% 82% AF% E3% 83% 88), but the most important one is here.

[^ 1]: Also called a first class object, we will use the word "class" in this article because it is different from the object-oriented "class".

Can be built at runtime.

Yes, Python can define classes at runtime. It's rarely needed when programming normally, but it's useful to be able to do this when creating a framework. Next, let's take a look at Python reference type functions.

class type(name, bases, dict) If it has three arguments, it returns a new type object. It's essentially a dynamic form of class statement. The name string is the class name, which is the name attribute. The bases tuple is a list of base classes with the bases attribute. The dict dictionary is a namespace that contains the definition of the class body and is copied into a standard dictionary for the dict attribute.

creator.create uses this type function to [dynamically define classes](https://github.com/DEAP/deap/blob/1.3.0/deap/creator.py#L143 -L171). The globals function returns a dictionary of "global variables in a module". By using this and assigning it, the class is defined (it cannot be referenced from others just by creating it with the type function). Shouldn't we just define the class normally? Or how about creating a class in the provided module? I want to dig in various things, but it's interesting so it's OK (laughs)

The inheritance source base.Fitness is also interesting, but I will explain the Moe point a little later.

Implementation of Individual

The Overview then defines the individual class.

creator.create("Individual", list, fitness=creator.FitnessMin)

The following classes will be defined in the creator module by the operation of creator.create explained above.

class Individual(list):
    def __init__(self):
        self.fitness = FitnessMin()

The important point here is that "an individual is a list [^ 2]. What to put in the list has not been decided at this point."

[^ 2]: Other than the list can be a superclass, and the documentation also gives an example of inheriting a set.

Individual initialization

GA usually randomly generates early individuals to start the loop of selection, crossover, and mutation. The following part defines the corresponding definition. To be precise, at this point, the definition alone has not created an actual individual.

import random
from deap import tools

IND_SIZE = 10

toolbox = base.Toolbox()
toolbox.register("attribute", random.random)
toolbox.register("individual", tools.initRepeat, creator.Individual,
                 toolbox.attribute, n=IND_SIZE)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

A new element has emerged. Toolbox class. As the name of the tool box suggests, it seems to register for this. For now, take a look at the Toolbox.register Reference (https://deap.readthedocs.io/en/master/api/base.html#deap.base.Toolbox.register).

register(alias, method[, argument[, ...]]) Register the function (method argument) with the name alias. You can set the default arguments to pass to the registered function. It can be overwritten when the function is called.

Check the code registered under the name ʻindividual` again.

toolbox.register("individual", tools.initRepeat, creator.Individual,
                 toolbox.attribute, n=IND_SIZE)

You need to look at the tools.initRepeat reference (https://deap.readthedocs.io/en/master/api/tools.html#deap.tools.initRepeat) to see what this is doing There is.

deap.tools.initRepeat(container, func, n) Call the func function n times and call the container function with that as an argument.

The position corresponding to the container position is ʻIndividual. This was a list (a class that inherited). func is toolbox.attribute. This is just another name for random.random. From this, the following method is defined by calling Toolbox.register ("individual", tools.initRepeat, creator.Individual, toolbox.attribute, n = IND_SIZE) `.

def individual():
    return tools.initRepeat(creator.Individual, random.random, n=IND_SIZE)
    """
Further expansion of the initRepeat process results in the following
    return Individual([random.random() for _ in range(IND_SIZE)])
    """

In other words, by calling ʻindividual ()`, we define a method that generates 10 random numbers of [0, 1) and packs them into an individual. With this, "what to put in the list" is defined in response to "what to put in the list is not decided at this point" that I wrote earlier. The program structure is affected by whether to perform GA of binary genes or GA of real values, but it is one of the moe points that DEAP can relatively easily define "what actually enters the list". ..

Partial application

Well, the interesting part is from here. In Toolbox.register," Set the default argument to the passed function and give it an alias "is done. Why can this be done? What comes out here is the idea of ** partial application **. Let's take a look at Toolbox.register code for now. Paste the extracted one only in the important part.

py:deap/py:base.From py


    def register(self, alias, function, *args, **kargs):
        pfunc = partial(function, *args, **kargs)
        setattr(self, alias, pfunc)

partial is a function provided by the functools module.

functools.partial(func, /, *args, **keywords) Returns a new partial object. When called, this object behaves like a func called with the positional argument args and the keyword argument keywords.

Let's take a look at the definition of population to generate a population to explain what is fun with partial application. The definition of ʻindividual` is also reprinted for comparison.

toolbox.register("individual", tools.initRepeat, creator.Individual,
                 toolbox.attribute, n=IND_SIZE)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

In ʻindividual," all arguments are specified "in ʻinitRepeat, but in population, "n is not specified". Therefore, you must specify n when calling population.

pop = toolbox.population(n=50)

When solving a problem with GA, if you decide the gene expression (gene length) for the target problem, it usually does not change. On the other hand, if the number of individuals is small or large, it will be in the middle and "parameters to change in various ways". Therefore, n is specified each time. Partial application means that you can create a more specialized function using a general-purpose function and describe the process using it.

Setting the method in DEAP

Now that you know what the rest of the Overview is doing, I'll explain it for now.

He explained that GA will try various methods of selection, crossover, and mutation. Each method has method-specific parameters. On the other hand, in the case of crossing, "takes two individuals as arguments and returns two offspring", and in the case of mutation, "takes one individual as an argument and returns one mutated individual". If it is selected, "general input / output of operation" such as "takes the population and the number of survivors as arguments and returns the surviving individuals" is decided. Partial application is also used for these, and a method is created in which the parameters unique to the method are partially applied to each of the prepared methods.

def evaluate(individual):
    return sum(individual),

toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=1, indpb=0.1)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("evaluate", evaluate)

At this time, it is important to name mate, mutate, select, evaluate when using "the whole algorithm that turns GA". EaSimple, which is one of the "overall algorithms", says as follows: ..

deap.algorithms.eaSimple(population, toolbox, cxpb, mutpb, ngen[, stats, halloffame, verbose]) This function assumes that mate, mutate, select and evaluate are registered in the toolbox.

It means that it can be called properly by giving a name that the framework side expects.

Suppression of recalculation

I wrote it for a long time and forgot about it myself, but the last point is the moe point of base.Fitness. If you look at Tutorials: Part 2, you will find something interesting.

ind1.fitness.values = evaluate(ind1)
print ind1.fitness.valid    # True

Why does valid become True when assigned to values? This is achieved with ** Properties **. Furthermore, values themselves are actually properties. This is a karakuri that when the assignment is made, the one multiplied by the weights defined at the beginning is retained. [^ 3]

[^ 3]: In order to handle both the minimization problem and the maximization problem in a unified manner by reversing the sign.

Implementation of valid Implementation of values

Why is there something called valid like this? This is because the crossover rate and mutation rate are set for crossovers and mutations, respectively, and in reality, "genes that did not cross or mutate can be created". Since they are the same gene, the objective function value is naturally the same. In that case, it is useless to recalculate and to avoid it [^ 4].

[^ 4]: It seems that some test questions take a few minutes if they are real questions. It's about the computing power that I heard a long time ago.

Summary

So far, we've looked at the deep backside of DEAP. Some of the Python mysteries that you don't usually see and that you might be able to use somewhere if you know them are:

--Dynamic definition of class using type function --Partial application using functiontools.partial

I love frameworks that make full use of language features, but it feels like DEAP is also making full use of Python.

Recommended Posts

Deep Python learned from DEAP
sql from python
Python Deep Learning
MeCab from Python
Deep learning × Python
Deep Learning from scratch-Chapter 4 tips on deep learning theory and implementation learned in Python
Python vs Ruby "Deep Learning from scratch" Summary
Touch MySQL from Python 3
Operate Filemaker from Python
Use fluentd from python
Python: Deep Learning Practices
Deep Learning from scratch
Access bitcoind from python
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Python from or import
Use MySQL from Python
Run python from excel
Install python from source
Execute command from Python
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Iptables learned from documentation
Operate neutron from Python!
Use MySQL from Python
Operate LXC from Python
Manipulate riak from python
Force Python from Fortran
Python: Deep Learning Tuning
Use BigQuery from python.
Execute command from python
[Python] Read From Stdin
Use mecab-ipadic-neologd from python
Flatten using Python yield from
Call CPLEX from Python (DO cplex)
Python evolutionary computation library Deap
Deep Learning from scratch 1-3 chapters
Python Evolutionary Computation Library Deap (3)
Post from Python to Slack
Refactoring Learned in Python (Basic)
Grammar features added from Python3.6
Cheating from PHP to Python
Programming Learned from Books May 8
Make MeCab available from Python3
Information obtained from tweet_id (Python)
OCR from PDF in Python
Run illustrator script from python
Use MySQL from Anaconda (python)
Python shallow copy and deep copy
Anaconda updated from 4.2.0 to 4.3.0 (python3.5 updated to python3.6)
Study from Python Hour4: Object-oriented ②
Query Athena from Lambda Python
Access Oracle DB from Python
Python shallow and deep copy
Study from Python Hour3: Functions
Start / stop GCE from python
Programming Learned from Books May 9
Python classes learned in chemoinformatics
Stop Omxplayer from Python code
Switch from python2.7 to python3.6 (centos7)
Connect to sqlite from python
Programming learned from books May 11