[PYTHON] Let's decide the winner of bingo

problem

I will do bingo at the company's year-end party. You were asked to develop a system for bingo. However, it has the following requirements (fiction).

-"7 prizes" (→ 7 winners) ―― "Please keep the ratio of winners between men and women as constant as possible." ―― “Please keep the ratio of winners in each department as constant as possible” -"Set the winning probability (weight) of young people, veterans, and officers to 1, 2, and 3."

(Reference: Similar story)

Way of thinking

Combinatorial optimization allows you to find combinations that take into account various conditions. Winners are selected and arranged in order according to the winning probability. According to the order, the $ i $ th score is $ 2 ^ {1-i} $. By maximizing the sum of the scores, we will select the winners according to the winning probability as much as possible. The ratio of men and women and the ratio of departments are subtracted from the objective function by penalizing the amount that deviates from the ratio.

Solve with Python

Load the participant information into pandas.DataFrame (if the code can't be executed, download the CSV from the URL and load it).

import numpy as np, pandas as pd, urllib
from pulp import lpDot, lpSum
from ortoolpy import model_max, addbinvars, addvars, addvals

url = 'https://www.dropbox.com/s/refo0vfj5wbmv2h/bingo.csv?dl=1'
with urllib.request.urlopen(url) as fp:
    df = pd.read_csv(fp)

Score and variable creation

Each row of DataFrame (df) is a participant. The winning probability (Rate column) is used as a ratio to win in order with numpy.random.choice. Calculate the score with that order as the index and make it into the Score column. Optimization maximizes the sum of the scores.

Add a column Var of variables that indicate whether or not you have won. Also, pena1 is a penalty that deviates from the ratio of men and women. Let pena2 be the amount of penalty that deviates from the affiliation ratio.

nn = len(df)  #Number of people
rnd = np.random.choice(df.index, nn, False, df.Rate / df.Rate.sum())
df['Score'] = pd.Series(2.0**-np.arange(nn), index=rnd)  #Score
addbinvars(df)  #Added variable (whether winning or not) as Var column
pena1, pena2 = addvars(2)  #Two penalties for deviating from the ratio
print(df)
Name IsMale Div Rate Score Var
0 Hidehito Asakura True General Affairs Department 2 0.000488 v000001
1 Ryo Miyagawa True Accounting department 1 0.001953 v000002
2 Kenji Furuhashi True General Affairs Department 2 0.000977 v000003
3 Motomura sentence False General Affairs Department 3 0.000015 v000004
4 Keiji Otsubo True Accounting department 1 0.000002 v000005
5 Kazutaka Hatakeyama True General Affairs Department 2 0.000031 v000006
6 Wakana Takeuchi False Accounting department 2 0.000008 v000007
7 Yoda Kanae False Human Resources Department 2 0.015625 v000008
8 Keisuke Furuta True Human Resources Department 3 0.125000 v000009
9 Kei Ishikawa True Human Resources Department 1 0.000061 v000010
10 Eriko Tominaga False Human Resources Department 2 1.000000 v000011
11 Tobita Sato False General Affairs Department 1 0.003906 v000012
12 Seito Kamiyama True Human Resources Department 2 0.250000 v000013
13 Ryuichiro Tabata True General Affairs Department 3 0.500000 v000014
14 Rie Kamei False Accounting department 1 0.000004 v000015
15 Sone Haruka False General Affairs Department 1 0.000244 v000016
16 Kasuga Kiyoshiro True Accounting department 2 0.007812 v000017
17 Akisa Nakai False Accounting department 3 0.062500 v000018
18 Koho Kaneshiro False Human Resources Department 2 0.031250 v000019
19 Yoshikatsu Kakinuma True General Affairs Department 1 0.000122 v000020

Check the number of people by gender and department.

print(df.groupby('IsMale').Div.value_counts())
IsMale  Div
False Human Resources Department 3
Accounting Department 3
General Affairs Department 3
True General Affairs Department 5
Human Resources Department 3
Accounting Department 3
Name: Div, dtype: int64

Create a mathematical model and solve it

--Purpose function: Sum of scores --Number of people x (Amount of gender ratio deviation + Department ratio deviation amount) --Constraints --The number of winners is 7 ――The ratio by gender is constant --Constant ratio by department

ns = 7  #Number of winners

m = model_max()  #Mathematical model
m += lpDot(df.Score, df.Var) - nn * (pena1 + pena2)  #Objective function
m += lpSum(df.Var) == ns  #Number of winners
for _, gr in df.groupby('IsMale'):  #Keep the ratio by gender constant
    m += lpSum(gr.Var) <= ns * len(gr) / nn + pena1
for _, gr in df.groupby('Div'):  #Keep the ratio by department constant
    m += lpSum(gr.Var) <= ns * len(gr) / nn + pena2
m.solve()  #Solution
addvals(df)  #Add result as Val column
print(df[df.Val > 0])  #Winner
Name IsMale Div Rate Score Var Val
10 Eriko Tominaga False Human Resources Department 2 1.000000 v000011 1.0
11 Tobita Sato False General Affairs Department 1 0.003906 v000012 1.0
12 Seito Kamiyama True Human Resources Department 2 0.250000 v000013 1.0
13 Ryuichiro Tabata True General Affairs Department 3 0.500000 v000014 1.0
16 Kasuga Kiyoshiro True Accounting department 2 0.007812 v000017 1.0
17 Akisa Nakai False Accounting department 3 0.062500 v000018 1.0
19 Yoshikatsu Kakinuma True General Affairs Department 1 0.000122 v000020 1.0

Winners have been decided. Let's look at the number of people by gender and department.

print(df[df.Val > 0].groupby('IsMale').Div.value_counts())
IsMale  Div
False Human Resources Department 1
Accounting Department 1
General Affairs Department 1
True General Affairs Department 2
Human Resources Department 1
Accounting Department 1
Name: Div, dtype: int64

Looks good.

Summary

--Combinatorial optimization is a method for finding combinations that satisfy various conditions. --The order of winning is decided according to the winning probability, and the score is [1, 0.5, 0.25, ...] for each order. By maximizing the sum of these scores, we will try to select the winners in a predetermined order.

(Personally, I hope there is no demand for this article)

that's all

Recommended Posts

Let's decide the winner of bingo
Let's decide the position of the fire station by combinatorial optimization
Let's claim the possibility of pyenv-virtualenv in 2021
Let's summarize the construction of NFS server
Let's investigate the mechanism of Kaiji's cee-loline
Let's decide the date course by combinatorial optimization
The beginning of cif2cell
The meaning of self
Let's use the API of the official statistics counter (e-Stat)
Let's clear the ambiguity of the hyphen (-) of the su command now! !!
Let's break down the basics of TensorFlow Python code
the zen of Python
The story of sys.path.append ()
Let's use the Python version of the Confluence API module.
Let's use the open data of "Mamebus" in Python
Let's test the medical collapse hypothesis of the new coronavirus
[Ev3dev] Let's understand the mechanism of LCD (screen) control
Let's analyze the sentiment of Tweet using Chainer (1st)
Revenge of the Types: Revenge of types
[Python] Let's change the URL of the Django administrator site
Let's cross the wall of the left-handed coordinate system and the right-handed coordinate system.
Let's visualize the trading volume of TSE stocks --jpxlab sample
Pepper decides the winner by measuring the size of the audience's applause
Let's utilize the railway data of national land numerical information
Let's make the analysis of the Titanic sinking data like that
Let's execute the command on time with the bot of discord
Let's predict the timing of the bals and enjoy the movie slowly
Let's guess the development status of the city from the satellite image.
Let's visualize the number of people infected with coronavirus with matplotlib
Let's use the distributed expression of words quickly with fastText!
Align the version of chromedriver_binary
Scraping the result of "Schedule-kun"
10. Counting the number of lines
The story of building Zabbix 4.4
Towards the retirement of Python2
Let's search from the procession
[Apache] The story of prefork
Compare the fonts of jupyter-themes
About the ease of Python
Get the number of digits
Let's turn the air gacha
Explain the code of Tensorflow_in_ROS
Reuse the results of clustering
GoPiGo3 of the old man
Calculate the number of changes
Change the theme of Jupyter
The popularity of programming languages
Change the style of matplotlib
About the components of Luigi
Filter the output of tracemalloc
Simulation of the contents of the wallet
The Power of Pandas: Python
[Note] Let's try to predict the amount of electricity used! (Part 1)
Let's take a look at the feature map of YOLO v3
Let's measure the test coverage of pushed python code on GitHub.
[Python] Let's reduce the number of elements in the result in set operations
Let's summarize the degree of coupling between modules with Python code
[Word2vec] Let's visualize the result of natural language processing of company reviews