[PYTHON] Design of experiments and combinatorial optimization

This is the third day article of Algorithm Advent Calendar 2015.

Introduction

We will introduce a brief introduction of design of experiments and an approach by combinatorial optimization as its development.

background

--Suppose you want to do some analysis from the sensor information. ――Sensors are available for 10,000 yen and 30,000 yen, and may not be placed, so there are three types of choices. ――There are 20 candidates for the sensor installation location. ――The total purchase cost of all sensors must be kept below 50,000 yen. ――Suppose you want to know where and how many sensors should be placed for efficient verification.

As an example of the case, "A sensor of 10,000 yen is placed at points A and B, and a sensor of 30,000 yen is placed at point C".

the term

Use the following terms.

—— Factor: The subject to be examined for which you want to determine the level. This time, it is a candidate for sensor placement. --Level: Possible value of the factor. This time, there are three types of sensor costs: 0,000 yen, 10,000 yen, and 30,000 yen. --Interaction: The effect of one factor depends on the level of another factor. --Latin square: A matrix of numbers that does not have the same numbers in its rows or columns. -Design of experiments: A method for conducting efficient experiments. Here, we aim to reduce the number of cases.

This time, it is assumed that there is no interaction.

How to find the number of cases using design of experiments.

Apart from the background story, as a general theory, I will show an example with three types of factors and three types of levels. Simply put, you need $ 3 ^ 3 = 27 $ as many cases.

Design of experiments allows you to create a number of cases based on the Latin square (shown below).

1 2 3
2 3 1
3 1 2

If you use the Latin square, you can do it efficiently in 9 cases as follows. These 9 cases include all combinations of factors / levels, and the number of all factors / levels is the same.

9 cases selected (level for each factor)

Since each of the 9 squares in the Latin square corresponds to the case, consider the row number as factor 1, the column number as factor 2, and the square number as factor 3.

Disadvantages of design of experiments

Design of experiments has the following drawbacks:

--Only a specific number of factors and levels can be used. --No additional conditions can be considered.

Thinking by combinatorial optimization

In the design of experiments, the number of cases was selected to be as small as possible after securing a certain number of factors / levels. You can think of the same as an optimization problem. Specifically, specify the minimum number of cases that include each level of each factor, and find the method of selecting cases by mixed integer optimization.

Formulation

Objective function \sum_i{v_i} Required number of cases
Constraints \sum_i{\\{C_{ij} v_i | C_{ij} = k\\}} \ge N \forall j \lt 3, \forall k \in \\{0, 1, 3\\} There must be at least the minimum number of cases for each factor and each level
variable v_i \in \\{0, 1\\} \forall i \lt total number of cases Whether to select for each case

$ C_ {ij} $ represents the $ j $ th level in the $ i $ th case, and $ N $ represents the lowest number.

Using Python, it can be calculated as follows. (It is assumed that the variable cases contains the total cases with a purchase cost of 50,000 yen or less.)

python


from pulp import *
r = [0, 1, 3] #level
for N in [20, 40, 60, 120]: #Minimum number
    m = LpProblem()
    v = [LpVariable('v%d'%i, cat=LpBinary) for i in range(len(cases))]
    m += lpSum(v)
    for j in range(len(cases[0])):
        for k in r:
            m += lpSum(v[i] for i in range(len(cases)) if cases[i][j] == k) >= N
    m.solve()
    print(N, LpStatus[m.status], value(m.objective)) #Minimum number, status, number of cases

result

Let's compare it with the case of random sampling. The minimum number of random samplings is an approximate value obtained in the simulation. You can see that the case could be selected efficiently by using combinatorial optimization.

Minimum number Combinatorial optimization Random sampling
20 400 cases 3200 case
40 800 cases 5700 case
60 1200 cases 7900 case
120 2400 case 14400 case

Recommended Posts

Design of experiments and combinatorial optimization
The power of combinatorial optimization solvers
Separation of design and data in matplotlib
Use combinatorial optimization
Combinatorial optimization --Typical problem-- Partition of a set problem
Combinatorial optimization to find the hand of "Millijan"
Minimize the number of polishings by combinatorial optimization
Judging the finish of mahjong by combinatorial optimization
Grouping by combinatorial optimization
Let's decide the position of the fire station by combinatorial optimization
Mechanism of pyenv and virtualenv
Pre-processing and post-processing of pytest
Combinatorial optimization to investigate stars
Consistency of scikit-learn API design
Combination of recursion and generator
Grouping games with combinatorial optimization
Combination of anyenv and direnv
Explanation and implementation of SocialFoceModel
Differentiation of sort and generalization of sort
Coexistence of pyenv and autojump
Use and integration of "Shodan"
Problems of liars and honesty
Machine learning and mathematical optimization
Occurrence and resolution of tensorflow.python.framework.errors_impl.FailedPreconditionError
Comparison of Apex and Lamvery
Source installation and installation of Python
Introduction and tips of mlflow.Tracking
Solving Knapsack Problems with Google's OR-Tools-Practicing the Basics of Combinatorial Optimization Problems