This is the third day article of Algorithm Advent Calendar 2015.
We will introduce a brief introduction of design of experiments and an approach by combinatorial optimization as its development.
--Suppose you want to do some analysis from the sensor information. ――Sensors are available for 10,000 yen and 30,000 yen, and may not be placed, so there are three types of choices. ――There are 20 candidates for the sensor installation location. ――The total purchase cost of all sensors must be kept below 50,000 yen. ――Suppose you want to know where and how many sensors should be placed for efficient verification.
As an example of the case, "A sensor of 10,000 yen is placed at points A and B, and a sensor of 30,000 yen is placed at point C".
Use the following terms.
—— Factor: The subject to be examined for which you want to determine the level. This time, it is a candidate for sensor placement. --Level: Possible value of the factor. This time, there are three types of sensor costs: 0,000 yen, 10,000 yen, and 30,000 yen. --Interaction: The effect of one factor depends on the level of another factor. --Latin square: A matrix of numbers that does not have the same numbers in its rows or columns. -Design of experiments: A method for conducting efficient experiments. Here, we aim to reduce the number of cases.
This time, it is assumed that there is no interaction.
Apart from the background story, as a general theory, I will show an example with three types of factors and three types of levels. Simply put, you need $ 3 ^ 3 = 27 $ as many cases.
Design of experiments allows you to create a number of cases based on the Latin square (shown below).
1 | 2 | 3 |
---|---|---|
2 | 3 | 1 |
3 | 1 | 2 |
If you use the Latin square, you can do it efficiently in 9 cases as follows. These 9 cases include all combinations of factors / levels, and the number of all factors / levels is the same.
9 cases selected (level for each factor)
Since each of the 9 squares in the Latin square corresponds to the case, consider the row number as factor 1, the column number as factor 2, and the square number as factor 3.
Design of experiments has the following drawbacks:
--Only a specific number of factors and levels can be used. --No additional conditions can be considered.
In the design of experiments, the number of cases was selected to be as small as possible after securing a certain number of factors / levels. You can think of the same as an optimization problem. Specifically, specify the minimum number of cases that include each level of each factor, and find the method of selecting cases by mixed integer optimization.
Formulation
Objective function | Required number of cases | |
---|---|---|
Constraints | There must be at least the minimum number of cases for each factor and each level | |
variable | Whether to select for each case |
$ C_ {ij} $ represents the $ j $ th level in the $ i $ th case, and $ N $ represents the lowest number.
Using Python, it can be calculated as follows. (It is assumed that the variable cases contains the total cases with a purchase cost of 50,000 yen or less.)
python
from pulp import *
r = [0, 1, 3] #level
for N in [20, 40, 60, 120]: #Minimum number
m = LpProblem()
v = [LpVariable('v%d'%i, cat=LpBinary) for i in range(len(cases))]
m += lpSum(v)
for j in range(len(cases[0])):
for k in r:
m += lpSum(v[i] for i in range(len(cases)) if cases[i][j] == k) >= N
m.solve()
print(N, LpStatus[m.status], value(m.objective)) #Minimum number, status, number of cases
Let's compare it with the case of random sampling. The minimum number of random samplings is an approximate value obtained in the simulation. You can see that the case could be selected efficiently by using combinatorial optimization.
Minimum number | Combinatorial optimization | Random sampling |
---|---|---|
20 | 400 cases | 3200 case |
40 | 800 cases | 5700 case |
60 | 1200 cases | 7900 case |
120 | 2400 case | 14400 case |
Recommended Posts