[PYTHON] Divide into teams by combinatorial optimization (minimize average deviation)

Introduction

Let's solve the problem introduced in Teaming by combinatorial optimization with another model.

This problem is a combinatorial optimization problem that minimizes variability. Variance is commonly used as an indicator of variability, but it is more difficult than a linear programming problem that minimizes a linear function because it is represented by a quadratic function. Therefore, in the original article, it is formulated as a linear programming problem that minimizes the difference (range) between the maximum value and the minimum value.

On the other hand, there is also a way to minimize the mean deviation instead of the range. Therefore, in this article, we will first compare the differences between the range and mean deviation models. Finally, let's solve the example of the mean deviation minimization problem with Python + PuLP.

Problem setting

Therefore, the basic policy of the objective function is as follows.

$$ \ text {minimize} \ quad \ sum_ {j = 1} ^ m (\ text {variation within team $ j }) + (\ text {variation between teams}) \ qquad \ tag {0.1} $

You can also weight and balance either variation.

Formulation

We will formulate the problem. First, define the following variables.

\begin{align}
n &:Number of members\\
m &:Number of teams\\
l &:Number of skills\\
x_{i,j} &:Assignment of member i to team j(\in \{0,1\}) \\
a_{i,k} &:Member i skill k score\\
b_i &:Sum of all skills of member i(=\sum_{k=1}^l a_{i,k}) \\
\end{align}

Range minimization problem

Use the difference (range) between the maximum and minimum values as an indicator of variation and minimize it.

here,

\begin{align}
y_{j,\min}, y_{j,\max} &:Minimum and maximum skills in team j\\
z_\min, z_\max &:Minimum and maximum between teams
\end{align}

Then, it can be formulated as a range minimization problem as follows.

\begin{align}
\text{minimize} \quad &\sum_{j=1}^m (y_{j \max} -y_{j \min}) + 1.5 (z_\max - z_\min) \tag{1.1}\\
\text{subject to} \quad & y_{j \min} \leq \sum_{i=1}^n a_{i,k} x_{i,j} \leq y_{j \max} &j=1,\ldots,m;\, k=1,\ldots,l \tag{1.2}\\
& z_\min \leq \sum_{i=1}^n b_i x_{i,j} \leq z_\max &j=1,\ldots,m \tag{1.3}\\
&\sum_{j=1}^m x_{i,j} = 1 &i=1,\ldots,n \tag{1.4}\\
& x_{i,j} \in \{0,1\} &i=1,\ldots,n;\, j=1,\ldots,m \tag{1.5}\\
\end{align}

Equation (1.4) is a constraint equation that indicates that every member belongs to any one team. According to the original article, the weight of 1.5 is applied to the variation between teams. The number of variables is small and it can be formulated very simply.

Average deviation minimization problem

If $ x_i $ is the individual value and $ \ bar {x} $ is the mean, then the variance and mean deviation can be expressed as:

The mean deviation contains an absolute value and is avoided because it is indistinguishable and computationally cumbersome, but it can be easily removed if the objective function of the optimization problem contains an absolute value.

Rewriting equation (0.1) using the mean deviation gives:

\begin{align}
\text{minimize} \quad &\sum_{j=1}^m \Bigl(\sum_{k=1}^l \Bigl|\sum_{i=1}^n a_{i,k} x_{i,j} - \mu_j \Bigr|\Bigr) + 1.5\sum_{j=1}^m\Bigl|\sum_{i=1}^n b_i x_{i,j} -\nu\Bigr| \tag{2.1}\\
\text{subject to} \quad &\mu_j = \frac{1}{l}\sum_{k=1}^l\sum_{i=1}^n a_{i,k}x_{i,j} &j=1,\ldots,m \tag{2.2}\\
& \nu = \frac{1}{m} \sum_{j=1}^m \sum_{i=1}^n b_i x_{i,j} \tag{2.3}\\
&\sum_{j=1}^m x_{i,j} = 1 &i=1,\ldots,n \tag{2.4}\\
& x_{i,j} \in \{0,1\} &i=1,\ldots,n;\, j=1,\ldots,m \tag{2.5}\\
\end{align}

Such an absolute value minimization problem can be transformed into a linear programming problem by introducing the auxiliary variables $ y_ {j, k}, , z_j $ as follows.

\begin{align}
\text{minimize} \quad &\sum_{j=1}^m \Bigl(\sum_{k=1}^l y_{j,k} + 1.5 z_j\Bigr) \tag{2.6}\\
\text{subject to} \quad & -y_{j,k} \leq \sum_{i=1}^n a_{i,k} x_{i,j} - \mu_j \leq y_{j,k} & j=1,\dots,m;\,k=1,\ldots,l \tag{2.7}\\
& -z_j \leq \sum_{i=1}^n b_i x_{i,j} -\nu \leq z_j & j=1,\ldots,m \tag{2.8}\\
&\mu_j = \frac{1}{l}\sum_{k=1}^l\sum_{i=1}^n a_{i,k}x_{i,j} &j=1,\ldots,m \tag{2.9}\\
& \nu = \frac{1}{m} \sum_{j=1}^m \sum_{i=1}^n b_i x_{i,j} \tag{2.10}\\
&\sum_{j=1}^m x_{i,j} = 1 &i=1,\ldots,n \tag{2.11}\\
& x_{i,j} \in \{0,1\} &i=1,\ldots,n;\, j=1,\ldots,m \tag{2.12}\\
\end{align}

Compared to the range minimization model, the number of variables is larger, which makes the model a little more complicated.

Example: Optimization by Python + PuLP

Let's solve the example using Python's PuLP package.

You can select the solver in PuLP, but here we will use the default Coin-CBC.

import numpy as np
import pandas as pd
from pulp import *
from ortoolpy import addvar, addvars, addbinvars

#example
teams = ['A', 'B', 'C']
members = ['P', 'Q', 'R', 'S', 'T', 'U']
skills = ['Controller', 'Promoter', 'Supporter', 'Analyzer']
scores = pd.DataFrame([[6, 0, 1, 3],
                       [2, -1, 3, 5],
                       [2, 4, 0, 0],
                       [3, 4, 5, 0],
                       [0, 2, 1, 4],
                       [2, 3, -1, 1]],
                      index=members,
                      columns=skills)

nteams = len(teams)  #Number of teams
nmembers = len(scores.index)  #Number of members
nskills = len(scores.columns)  #Number of ability types

print(scores)

#    Controller  Promoter  Supporter  Analyzer
# P           6         0          1         3
# Q           2        -1          3         5
# R           2         4          0         0
# S           3         4          5         0
# T           0         2          1         4
# U           2         3         -1         1

Minimize range

Please refer to Original article.

Minimize mean deviation

#Minimize mean deviation

m = LpProblem()  #Mathematical model
x = pd.DataFrame(addbinvars(nmembers, nteams), index=members, columns=teams)  #allocation
y = pd.DataFrame(addvars(nteams, nskills), index=teams, columns=skills)  #Average deviation within the team
mu = pd.DataFrame(addvars(nteams), index=teams)   #Average within the team
z = pd.DataFrame(addvars(nteams), index=teams)  #Average deviation per team
nu = addvar()  #Average for all teams

m.setObjective(lpSum([lpSum(y.loc[j]) + 1.5*z.loc[j] for j in teams]))  #Objective function

m.addConstraint(lpSum(np.dot(scores.sum(1), x)) / nteams == nu)
for j in teams:
    m.addConstraint(lpDot(scores.sum(1), x[j]) - nu <= z.loc[j])
    m.addConstraint(lpDot(scores.sum(1), x[j]) - nu >= -z.loc[j])
    m.addConstraint(lpSum(np.dot(x[j], scores)) / nskills == mu.loc[j])
    for k in skills:
        m.addConstraint(lpDot(scores[k], x[j]) - mu.loc[j] <= y.loc[j,k])
        m.addConstraint(lpDot(scores[k], x[j]) - mu.loc[j] >= -y.loc[j,k])
for i in members:
    m.addConstraint(lpSum(x.loc[i]) == 1)  #Belong to some team
        
m.solve()  #Solution

if m.status:
    for i, xi in enumerate(np.vectorize(value)(x).tolist()):
        print((members[i], teams[xi.index(1)]))
else:
    print('The optimal solution could not be obtained.')

#result:(member,team)
# ('P', 'B')
# ('Q', 'A')
# ('R', 'A')
# ('S', 'C')
# ('T', 'B')
# ('U', 'C')

I got the same solution as the original article. It happened that they were the same because I was solving another model.

in conclusion

We have taken up the average deviation minimization problem. This model can be used, for example, in portfolio optimization. In portfolio optimization, the variation in expected return is regarded as a risk and formulated as a risk minimization problem, but at that time, the average deviation may be minimized. However, in the case of integer programming problems (combinatorial optimization problems, discrete optimization problems) in which integers are included in the decision variables, the problem becomes unsolvable as the scale of the problem increases. The problem I took up this time was only about "dividing 20 people into 5 teams", and it became difficult to solve it in a realistic time on my PC (it cannot be solved even if left overnight). Using a commercial solver like Gurobi or CPLEX as a solver can be a significant improvement, especially in a multi-core environment, but it can still be difficult to solve on a large scale. In that case, you can review the model or use an approximate solution.

[Continued] As an approximate solution to this problem, I will introduce how to solve it using Simanneal, a Python package for simulated annealing. Separate teams by combinatorial optimization (simulated annealing)

reference

Recommended Posts

Divide into teams by combinatorial optimization (minimize average deviation)
Divide into teams by combinatorial optimization
Divide into teams by combinatorial optimization (simulated annealing method)
Minimize the number of polishings by combinatorial optimization
Grouping by combinatorial optimization
Determine recorded programs by combinatorial optimization
Think about menus by combinatorial optimization
Horse racing winning method by combinatorial optimization
Let's decide the date course by combinatorial optimization
Judging the finish of mahjong by combinatorial optimization
[Python] Divide Switch albums into folders by game