[PYTHON] Create an academic society program with combinatorial optimization

Create an academic conference program

** You **, the executive committee member of the research presentation of the academic society, has decided to create a program for the research presentation.

--There were applications for 60 presentations. --Each presentation has one or two related ** keywords **. ――The presentations of the four people will be combined into one session, and a total of 15 sessions will be created. --Each session has one session ** theme **. --The presentation in the session shall have the theme as a keyword. ――You have to decide the session theme for each application and divide it into 15 sessions.

Way of thinking

--Each keyword of all presenters is a candidate for selection. --For the keyword, it is assumed that the degree of relevance to the announcement is set by (0-1) [^ 1]. -** We will maximize ** the "sum of relevance of selected candidates" **. --Each presentation must be ** assigned to one of the sessions **. ――For that purpose, some candidates will have an arbitrary category as a dummy for each presentation. --The relevance of any category is very small (-10).

[^ 1]: For example, it is possible to calculate with Word2Vec.

You can solve the above problem by using Combinatorial optimization. Let's formulate it.

Maximize $ \ sum_i {Relevance_i x_i} $ Sum of relevance of assigned candidates
Variables $ x_i \ in \ {0, 1 \} $ $ x_i $: $ i $ Select th candidate Whether
$ y_j \ in Integer greater than or equal to 0 $ $ y_j $: $ j $ Number of sessions in the th category
Constraints $ \ sum_j {y_j} = 15 $ Total number of sessions is 15
$ \ sum_ {i \ in F_h} {x_i} = 1 ~ ~ ~ \ forall h \ in H $ Each presentation is assigned exactly one keyword td>
$ \ sum_ {i \ in G_k} {x_i} \ le 4 y_j ~ ~ ~ \ forall k \ in C $ The number of presentations in each category is less than the number of frames td>

However, $ H $ is a set of presenters, $ F_h $ is a set of candidates for presenter $ h $, $ C $ is a set of categories, and $ G_k $ is a set of candidates for category $ k $.

Run by Python

Let's make the application table.

python3


import numpy as np, pandas as pd
from pulp import *
np.random.seed(3)
nu = 4 #Number of presentations per session
nr = 60 #Number of presenters
cat = 'Communication Medical Logistics Electric Power Civil Engineering Physical Chemistry Geometric Algebra Earth Science Biology'.split() #category
ns = nr / nu #Number of sessions
dat = [(i, j, np.random.rand()) for i in  range(nr)
       for j in np.random.choice(cat, np.random.randint(1, 3), replace=False)]
dat.extend([(i, 'Any', -10) for i in range(nr)]) # Anyカテゴリの追加
a = pd.DataFrame(dat, columns=['Presenter', 'category', 'Relevance']) #Candidate
a['vx'] = [LpVariable('vx%d'%i, cat=LpBinary) for i in a.index] #Which line to choose
print(a[:3])
Presenter Category Relevance vx
0 0 Physics 0.207243 vx0
1 1 power 0.492636 vx1
2 1 creatures 0.913301 vx2

vx is the column that corresponds to the variable $ x $.

Let's tabulate the categories.

python3


b = pd.DataFrame(cat, columns=['name'])
b['vy'] = [LpVariable('vy%d'%j, cat=LpInteger, lowBound=0) for j in b.index] #Number of sessions
print(b[:3])
Name vy
0 Communication vy0
1 Medical vy1
2 Logistics vy2

vy is the column that corresponds to the variable $ y $.

Formulate and solve, and see the assignments that have become arbitrary categories.

python3


m = LpProblem(sense=LpMaximize)
m += lpDot(a.Relevance, a.vx)
m += lpSum(b.vy) == ns #Total number of sessions is equal
for i in range(nr):
    m += lpSum(a.vx[a.Presenter==i]) == 1 # Presenterは1カテゴリを選ぶ
for _, r in b.iterrows():
    m += lpSum(a.vx[a.category==r.name]) <= r.vy * nu #Announcement is less than the number of frames
m.solve()
a['rx'] = a.vx.apply(value) #Allocation result
b['ry'] = b.vy.apply(value) #Session count result
print(a[(a.rx > 0)&(a.category=='Any')])
Presenter Category Relevance vx rx
117 26 optional -10.0 vx117 1.0

rx is the column that corresponds to the result of the variable $ x $. Presenter "26" has assigned it to any category.

Let's look at the number of presentations for each category.

python3


print(a.category[(a.rx>0)].value_counts())
Category
Communication 12
Power 8
Physics 8
Biology 7
Geometry 4
Civil engineering 4
Logistics 4
Medical 4
Earth science 4
Chemistry 4
Optional 1

The biology category isn't a multiple of 4, so the crazy 26 would be in here.

that's all

Recommended Posts