[PYTHON] Let's decide the winner of bingo

problem

I will do bingo at the company's year-end party. You were asked to develop a system for bingo. However, it has the following requirements (fiction).

-"7 prizes" (→ 7 winners) ―― "Please keep the ratio of winners between men and women as constant as possible." ―― “Please keep the ratio of winners in each department as constant as possible” -"Set the winning probability (weight) of young people, veterans, and officers to 1, 2, and 3."

(Reference: Similar story)

Way of thinking

Combinatorial optimization allows you to find combinations that take into account various conditions. Winners are selected and arranged in order according to the winning probability. According to the order, the $ i $ th score is $ 2 ^ {1-i} $. By maximizing the sum of the scores, we will select the winners according to the winning probability as much as possible. The ratio of men and women and the ratio of departments are subtracted from the objective function by penalizing the amount that deviates from the ratio.

Solve with Python

Load the participant information into pandas.DataFrame (if the code can't be executed, download the CSV from the URL and load it).

import numpy as np, pandas as pd, urllib
from pulp import lpDot, lpSum
from ortoolpy import model_max, addbinvars, addvars, addvals

url = 'https://www.dropbox.com/s/refo0vfj5wbmv2h/bingo.csv?dl=1'
with urllib.request.urlopen(url) as fp:
    df = pd.read_csv(fp)

Score and variable creation

Each row of DataFrame (df) is a participant. The winning probability (Rate column) is used as a ratio to win in order with numpy.random.choice. Calculate the score with that order as the index and make it into the Score column. Optimization maximizes the sum of the scores.

Add a column Var of variables that indicate whether or not you have won. Also, pena1 is a penalty that deviates from the ratio of men and women. Let pena2 be the amount of penalty that deviates from the affiliation ratio.

nn = len(df)  #Number of people
rnd = np.random.choice(df.index, nn, False, df.Rate / df.Rate.sum())
df['Score'] = pd.Series(2.0**-np.arange(nn), index=rnd)  #Score
addbinvars(df)  #Added variable (whether winning or not) as Var column
pena1, pena2 = addvars(2)  #Two penalties for deviating from the ratio
print(df)

	Name	IsMale	Div	Rate	Score	Var
0	Hidehito Asakura	True	General Affairs Department	2	0.000488	v000001
1	Ryo Miyagawa	True	Accounting department	1	0.001953	v000002
2	Kenji Furuhashi	True	General Affairs Department	2	0.000977	v000003
3	Motomura sentence	False	General Affairs Department	3	0.000015	v000004
4	Keiji Otsubo	True	Accounting department	1	0.000002	v000005
5	Kazutaka Hatakeyama	True	General Affairs Department	2	0.000031	v000006
6	Wakana Takeuchi	False	Accounting department	2	0.000008	v000007
7	Yoda Kanae	False	Human Resources Department	2	0.015625	v000008
8	Keisuke Furuta	True	Human Resources Department	3	0.125000	v000009
9	Kei Ishikawa	True	Human Resources Department	1	0.000061	v000010
10	Eriko Tominaga	False	Human Resources Department	2	1.000000	v000011
11	Tobita Sato	False	General Affairs Department	1	0.003906	v000012
12	Seito Kamiyama	True	Human Resources Department	2	0.250000	v000013
13	Ryuichiro Tabata	True	General Affairs Department	3	0.500000	v000014
14	Rie Kamei	False	Accounting department	1	0.000004	v000015
15	Sone Haruka	False	General Affairs Department	1	0.000244	v000016
16	Kasuga Kiyoshiro	True	Accounting department	2	0.007812	v000017
17	Akisa Nakai	False	Accounting department	3	0.062500	v000018
18	Koho Kaneshiro	False	Human Resources Department	2	0.031250	v000019
19	Yoshikatsu Kakinuma	True	General Affairs Department	1	0.000122	v000020

Check the number of people by gender and department.

print(df.groupby('IsMale').Div.value_counts())

IsMale  Div
False Human Resources Department 3
Accounting Department 3
General Affairs Department 3
True General Affairs Department 5
Human Resources Department 3
Accounting Department 3
Name: Div, dtype: int64

Create a mathematical model and solve it

--Purpose function: Sum of scores --Number of people x (Amount of gender ratio deviation + Department ratio deviation amount) --Constraints --The number of winners is 7 ――The ratio by gender is constant --Constant ratio by department

The penalty coefficient is appropriate.

ns = 7  #Number of winners

m = model_max()  #Mathematical model
m += lpDot(df.Score, df.Var) - nn * (pena1 + pena2)  #Objective function
m += lpSum(df.Var) == ns  #Number of winners
for _, gr in df.groupby('IsMale'):  #Keep the ratio by gender constant
    m += lpSum(gr.Var) <= ns * len(gr) / nn + pena1
for _, gr in df.groupby('Div'):  #Keep the ratio by department constant
    m += lpSum(gr.Var) <= ns * len(gr) / nn + pena2
m.solve()  #Solution
addvals(df)  #Add result as Val column
print(df[df.Val > 0])  #Winner

	Name	IsMale	Div	Rate	Score	Var	Val
10	Eriko Tominaga	False	Human Resources Department	2	1.000000	v000011	1.0
11	Tobita Sato	False	General Affairs Department	1	0.003906	v000012	1.0
12	Seito Kamiyama	True	Human Resources Department	2	0.250000	v000013	1.0
13	Ryuichiro Tabata	True	General Affairs Department	3	0.500000	v000014	1.0
16	Kasuga Kiyoshiro	True	Accounting department	2	0.007812	v000017	1.0
17	Akisa Nakai	False	Accounting department	3	0.062500	v000018	1.0
19	Yoshikatsu Kakinuma	True	General Affairs Department	1	0.000122	v000020	1.0

Winners have been decided. Let's look at the number of people by gender and department.

print(df[df.Val > 0].groupby('IsMale').Div.value_counts())

IsMale  Div
False Human Resources Department 1
Accounting Department 1
General Affairs Department 1
True General Affairs Department 2
Human Resources Department 1
Accounting Department 1
Name: Div, dtype: int64

Looks good.

Summary

--Combinatorial optimization is a method for finding combinations that satisfy various conditions. --The order of winning is decided according to the winning probability, and the score is [1, 0.5, 0.25, ...] for each order. By maximizing the sum of these scores, we will try to select the winners in a predetermined order.

(Personally, I hope there is no demand for this article)

that's all