--Easy confirmation of dynamic optimization --Calculation of expected value of golf
――Dynamic Optimization is a recent term for [Dynamic Programming](Dynamic Programming). --Solve a partial problem and use the result to solve the whole problem. At that time, for the same subproblem that appears repeatedly, the result cache is used to efficiently calculate (memoize).
In Python, lru_cache makes it easy to cache results. (If the argument can be hashed) Let's see the effect on the knapsack problem.
python
%matplotlib inline
import numpy as np, matplotlib.pyplot as plt
from functools import lru_cache
plt.rcParams['font.family'] = 'IPAexGothic'
plt.rcParams['font.size'] = 16
count = 0
np.random.seed(1)
_size = np.random.randint(100, 200, 100)
_weight = np.random.randint(100, 200, 100)
def make_sample(n):
size = tuple(_size[:n])
weight = tuple(_weight[:n])
capacity = sum(size) // 3 * 2
return size, weight, capacity
def knapsack1(size, weight, capacity):
if len(size) == 0 or capacity < min(size):
return 0
global count
count += 1
r = capacity - size[0]
if r < 0:
return knapsack1(size[1:], weight[1:], capacity)
else:
return max(weight[0] + knapsack1(size[1:], weight[1:], r),
knapsack1(size[1:], weight[1:], capacity))
@lru_cache(None)
def knapsack2(size, weight, capacity):
if len(size) == 0 or capacity < min(size):
return 0
global count
count += 1
r = capacity - size[0]
if r < 0:
return knapsack2(size[1:], weight[1:], capacity)
else:
return max(weight[0] + knapsack2(size[1:], weight[1:], r),
knapsack2(size[1:], weight[1:], capacity))
def count_knapsack1(n):
global count
count = 0
knapsack1(*make_sample(n))
return count
def count_knapsack2(n):
global count
count = 0
knapsack2(*make_sample(n))
return count
Here, we divide the knapsack problem into subproblems depending on whether the first item is selected or not, and solve them recursively.
--count_knapsack1 calculates the number of uncached calls. --count_knapsack2 calculates the number of cached calls.
The only difference is the presence or absence of lru_cache. Let's see the result.
python
rng = [10, 12, 14, 16, 18, 20]
res1 = [count_knapsack1(i) for i in rng]
res2 = [count_knapsack2(i) for i in rng]
plt.plot(rng, res1, label='No cache')
plt.plot(rng, res2, label='With cache')
plt.xlabel('Item count')
plt.ylabel('Number of calls')
plt.yscale('log')
plt.legend(loc='lower right');
Since the vertical axis is the logarithmic axis, you can see that the cache speeds up considerably.
Next, consider a simple golf problem.
――You and your opponent compete for a score of 18 holes. If you win, you get +1 point, if you lose, you get -1 point, and if you draw, you get 0 points. ――It is 18 holes, but all the conditions are the same. ――The opponent has a 20% chance of becoming a bogey, a 60% chance of becoming a par, and a 20% chance of becoming a birdie. ――You must choose either safety measures or hard-line measures for each hole. --The safeguard is 10% chance to be bogey, 80% chance to be par, and 10% chance to be birdie. ――The hard-line strategy is 30% chance to be bogey, 40% chance to be par, and 30% chance to be birdie.
Let's tabulate the possibilities (%) for each measure.
Safety measures td> | Hard measures td> tr> | |||||||
phase \ self td> | + 1 td> | 0 td> | -1 td> | td> | phase \ Self td> | + 1 td> | 0 td> | -1 td> tr> |
+1 | 2 | 16 | 2 | +1 | 6 | 8 | 6 | |
0 | 6 | 48 | 6 | +0 | 18 | 24 | 18 | |
-1 | 2 | 16 | 2 | -1 | 6 | 8 | 6 |
Looking at the score difference, there are 5 ways: [-2, -1, 0, +1, +2]. Subset sum problems are created by dividing these 10 ways (5 safety measures + 5 hard measures).
python
a0 = np.arange(-2, 3)
a1 = [0.02, 0.22, 0.52, 0.22, 0.02]
a2 = [0.06, 0.26, 0.36, 0.26, 0.06]
@lru_cache(None)
def golf(rem, df):
"""
rem:Number of holes remaining
df:Current score difference(Negative wins)
Return value:Whether to take safety measures,Expected score
"""
if rem == 1: #Final hole
s1 = np.inner(a1, (a0+df)<0) - np.inner(a1, (a0+df)>0)
s2 = np.inner(a2, (a0+df)<0) - np.inner(a2, (a0+df)>0)
else:
a = [golf(rem-1, df+i)[1] for i in a0]
s1 = np.inner(a, a1)
s2 = np.inner(a, a2)
return s1 >= s2, max(s1, s2)
rng = range(18,0,-1)
plt.xlabel('Number of holes remaining')
plt.ylabel('Expected score')
plt.plot(range(18), [golf(i, 0)[1] for i in rng]);
plt.xticks(range(18), rng);
The more holes you have left, the higher your expected score. This is thought to be due to taking safety measures when it is advantageous and hard measures when it is disadvantageous.
that's all