[PYTHON] Find golf expectations using dynamic optimization

things to do

--Easy confirmation of dynamic optimization --Calculation of expected value of golf

Easy confirmation of dynamic optimization

What is dynamic optimization?

――Dynamic Optimization is a recent term for [Dynamic Programming](Dynamic Programming). --Solve a partial problem and use the result to solve the whole problem. At that time, for the same subproblem that appears repeatedly, the result cache is used to efficiently calculate (memoize).

Easy confirmation with knapsack problem

In Python, lru_cache makes it easy to cache results. (If the argument can be hashed) Let's see the effect on the knapsack problem.

python


%matplotlib inline
import numpy as np, matplotlib.pyplot as plt
from functools import lru_cache
plt.rcParams['font.family'] = 'IPAexGothic'
plt.rcParams['font.size'] = 16

count = 0
np.random.seed(1)
_size = np.random.randint(100, 200, 100)
_weight = np.random.randint(100, 200, 100)

def make_sample(n):
    size = tuple(_size[:n])
    weight = tuple(_weight[:n])
    capacity = sum(size) // 3 * 2
    return size, weight, capacity

def knapsack1(size, weight, capacity):
    if len(size) == 0 or capacity < min(size):
        return 0
    global count
    count += 1
    r = capacity - size[0]
    if r < 0:
        return knapsack1(size[1:], weight[1:], capacity)
    else:
        return max(weight[0] + knapsack1(size[1:], weight[1:], r),
                    knapsack1(size[1:], weight[1:], capacity))

@lru_cache(None)
def knapsack2(size, weight, capacity):
    if len(size) == 0 or capacity < min(size):
        return 0
    global count
    count += 1
    r = capacity - size[0]
    if r < 0:
        return knapsack2(size[1:], weight[1:], capacity)
    else:
        return max(weight[0] + knapsack2(size[1:], weight[1:], r),
                    knapsack2(size[1:], weight[1:], capacity))

def count_knapsack1(n):
    global count
    count = 0
    knapsack1(*make_sample(n))
    return count

def count_knapsack2(n):
    global count
    count = 0
    knapsack2(*make_sample(n))
    return count

Here, we divide the knapsack problem into subproblems depending on whether the first item is selected or not, and solve them recursively.

--count_knapsack1 calculates the number of uncached calls. --count_knapsack2 calculates the number of cached calls.

The only difference is the presence or absence of lru_cache. Let's see the result.

python


rng = [10, 12, 14, 16, 18, 20]
res1 = [count_knapsack1(i) for i in rng]
res2 = [count_knapsack2(i) for i in rng]

plt.plot(rng, res1, label='No cache')
plt.plot(rng, res2, label='With cache')
plt.xlabel('Item count')
plt.ylabel('Number of calls')
plt.yscale('log')
plt.legend(loc='lower right');

image

Since the vertical axis is the logarithmic axis, you can see that the cache speeds up considerably.

Calculation of expected value of golf

Next, consider a simple golf problem.

――You and your opponent compete for a score of 18 holes. If you win, you get +1 point, if you lose, you get -1 point, and if you draw, you get 0 points. ――It is 18 holes, but all the conditions are the same. ――The opponent has a 20% chance of becoming a bogey, a 60% chance of becoming a par, and a 20% chance of becoming a birdie. ――You must choose either safety measures or hard-line measures for each hole. --The safeguard is 10% chance to be bogey, 80% chance to be par, and 10% chance to be birdie. ――The hard-line strategy is 30% chance to be bogey, 40% chance to be par, and 30% chance to be birdie.

Let's tabulate the possibilities (%) for each measure.

Safety measures Hard measures
phase \ self + 1 0 -1 phase \ Self + 1 0 -1
+12162+1686
06486+0182418
-12162-1686

Looking at the score difference, there are 5 ways: [-2, -1, 0, +1, +2]. Subset sum problems are created by dividing these 10 ways (5 safety measures + 5 hard measures).

python


a0 = np.arange(-2, 3)
a1 = [0.02, 0.22, 0.52, 0.22, 0.02]
a2 = [0.06, 0.26, 0.36, 0.26, 0.06]
@lru_cache(None)
def golf(rem, df):
    """
    rem:Number of holes remaining
    df:Current score difference(Negative wins)
Return value:Whether to take safety measures,Expected score
    """
    if rem == 1: #Final hole
        s1 = np.inner(a1, (a0+df)<0) - np.inner(a1, (a0+df)>0)
        s2 = np.inner(a2, (a0+df)<0) - np.inner(a2, (a0+df)>0)
    else:
        a = [golf(rem-1, df+i)[1] for i in a0]
        s1 = np.inner(a, a1)
        s2 = np.inner(a, a2)
    return s1 >= s2, max(s1, s2)

rng = range(18,0,-1)
plt.xlabel('Number of holes remaining')
plt.ylabel('Expected score')
plt.plot(range(18), [golf(i, 0)[1] for i in rng]);
plt.xticks(range(18), rng);

image

The more holes you have left, the higher your expected score. This is thought to be due to taking safety measures when it is advantageous and hard measures when it is disadvantageous.

that's all

Recommended Posts

Find golf expectations using dynamic optimization
Try function optimization using Hyperopt
Notes on optimization using Pytorch