Python in optimization

Introduction

In my business, I develop software using combinatorial optimization technology (for example, minimizing transportation costs in logistics). In the past, I used to create optimization models using C ++ and C #, but nowadays I often use Python. Here, we will introduce Python in optimization.

Benefits of Python

The reasons for using Python are as follows.

--Easy to understand. Since the model by mathematical formula and the model by Python are close, you can concentrate on more essential description and create a model that is easy to maintain. sample.png

--Short description amount is required. Compared to C ++ etc., the size of the program is a fraction. --Learning cost is small. It has a simple grammar and few reserved words. --Can be completed with Python. Since it is a general-purpose language, processing for various purposes can be described in almost all Python. For example, you can get data from the web, aggregate it, analyze it, optimize it, visualize it, and so on, all in Python. --There are many libraries. The package community site https://pypi.python.org/pypi alone has about 90,000 packages published. Many other packages are also available on https://github.com/ and https://anaconda.org/. --Can be executed in various environments. There are various operating systems such as Windows, Mac, and Linux, and processing systems such as Cython, Pypy, and IronPython. --Many optimization software supports Python. There are many optimization software, both paid and free, but many are available from Python.

Python is said to run slower than compiler languages such as C ++. However, in optimization, Python is mainly used for model creation (modeling), and dedicated software (solver) written in C ++ etc. is used for execution of optimization algorithms. Therefore, even if you use Python for optimization, the execution time does not matter much.

Optimizing modeling mainly uses PuLP and pandas packages.

--PuLP is a package of mathematical modeling and pandas is a package of data analysis. ――Pandas is suitable for handling data that can be represented in a table among the data included in the model, and can describe complicated processing in an easy-to-understand manner. Also, pandas uses numpy internally. --Numpy uses a highly optimized linear algebra library written in C or Fortran to efficiently calculate matrix calculations.

About PuLP

To solve a mathematical optimization problem, do the following steps:

--Create a mathematical model in the modeler --Call out the solver and get a solution

Solver is software that takes a mathematical model as input, solves the mathematical model, and outputs the value (solution) of a variable.

image.png

PuLP is software created by the COIN project and will be a modeler. With PuLP, you can use various solvers such as CBC, Gurobi, and GLPK. By default, CBC is used. When you install PuLP, CBC will be installed at the same time.

--CBC: COIN Project Free Solver (short for COIN-OR Branch and Cut) https://projects.coin-or.org/Cbc --Gurobi: High-performance commercial solver http://www.gurobi.com/ --GLPK: GNU free solver www.gnu.org/software/glpk/

The problem that PuLP can handle is the mixed integer optimization problem. The mixed integer optimization problem is a kind of mathematical optimization problem and has the following features.

--Represented using continuous (real) variables and integer variables --The objective function and constraints are linear expressions

If you want to find out more details, please refer to the reference site.

How to use PuLP

Consider the following problem.

problem


I want to create a lot of chemical products X and Y that can be synthesized from materials A and B.
To make 1kg of X, you need 1kg of A and 3kg of B.
To make 1kg of Y, you need 2kg of A and 1kg of B.
The price of both X and Y is 100 yen per kg.
To maximize the sum of the prices of X and Y when material A weighs only 16 kg and B weighs only 18 kg
Find out how many X and Y should be created.

prob.png

The problem is expressed by a mathematical model as follows. Representing a mathematical model with an expression is called formalization.

formula.png

Let's model this with PuLP and solve it.

python3


from pulp import *
m = LpProblem(sense=LpMaximize) #Mathematical model
x = LpVariable('x', lowBound=0) #variable
y = LpVariable('y', lowBound=0) #variable
m += 100 * x + 100 * y #Objective function
m += x + 2 * y <= 16 #Upper limit constraint for material A
m += 3 * x + y <= 18 #Upper limit constraint for material B
m.solve() #Run solver
print(value(x), value(y)) # 4, 6

The following is a brief explanation in order.

Package import

from pulp import *

Creating a mathematical model

For minimization problems: m = LpPrblem () For maximization problems: m = LpProblem (sense = LpMaximize)

Creating variables

Continuous variable: x = LpVariable (variable name, lowBound = 0) 0-1 Variable: x = LpVariable (variable name, cat = LpBinary) List of continuous variables: x = [LpVariable (i-th variable name, lowBound = 0) for i in range (n)] Variable names must be different

Objective function setting

m + = expression

Add constraints

m + = expression == expression m + = expression <= expression m + = expression> = expression

Expression example

2 * x + 3 * y - 5

Sum: lpSum (list of variables) Inner product: lpDot (list of coefficients, list of variables)

Run solver

m.solve()

Values of variables, expressions and objective functions

value (variable), value (expression), value (m.objective)

About the combination of PuLP and pandas

Combining PuLP and pandas and managing variables (LpVariable) in a pandas table (DataFrame) makes the formulation easier to understand.

Let's take the transportation optimization problem as an example.

Transport optimization problem

I want to transport parts from warehouses to factories. I would like to find a plan that minimizes transportation costs.

--I want to determine the amount of transportation from the warehouse group to the factory group → Variable --I want to minimize transportation costs → Objective function ――The amount of goods that can be shipped from each warehouse is less than the amount that can be supplied → Restrictions ――Delivery to each factory is more than demand → Restrictions

Transportation costs Assembly factory
F1 F2 F3 F4 Supply
Warehouse W1 10 10 11 17 35
W21619121441
W31512141242
Demand 28 29 31 25

Parameter setting

Set the required parameters. (The numbers are the same as in the previous table)

python3


import numpy as np, pandas as pd
from itertools import product
from pulp import *
np.random.seed(1)
nw, nf = 3, 4
pr = list(product(range(nw),range(nf)))
Supply= np.random.randint(30, 50, nw)
demand= np.random.randint(20, 40, nf)
Shipping costs= np.random.randint(10, 20, (nw,nf))

Mathematical model without pandas

Variables are accessed by subscripts.

python3


m1 = LpProblem()
v1 = {(i,j):LpVariable('v%d_%d'%(i,j), lowBound=0) for i,j in pr}
m1 += lpSum(Shipping costs[i][j] * v1[i,j] for i,j in pr)
for i in range(nw):
    m1 += lpSum(v1[i,j] for j in range(nf)) <=Supply[i]
for j in range(nf):
    m1 += lpSum(v1[i,j] for i in range(nw)) >=demand[j]
m1.solve()
{k:value(x) for k,x in v1.items() if value(x) > 0}
>>>
{(0, 0): 28.0,
 (0, 1): 7.0,
 (1, 2): 31.0,
 (1, 3): 5.0,
 (2, 1): 22.0,
 (2, 3): 20.0}

Mathematical model using pandas

Variables can be accessed by table attributes. First, let's create a table.

python3


a = pd.DataFrame([(i,j) for i, j in pr], columns=['Warehouse', 'factory'])
a['Shipping costs'] = Shipping costs.flatten()
a[:3]
Warehouse Factory Transportation Costs
00010
10110
20211

Let's make a mathematical model in the same way.

python3


m2 = LpProblem()
a['Var'] = [LpVariable('v%d'%i, lowBound=0) for i in a.index]
m2 += lpDot(a.Shipping costs, a.Var)
for k, v in a.groupby('Warehouse'):
    m2 += lpSum(v.Var) <=Supply[k]
for k, v in a.groupby('factory'):
    m2 += lpSum(v.Var) >=demand[k]
m2.solve()
a['Val'] = a.Var.apply(value)
a[a.Val > 0]
Warehouse Factory Shipping costs Var Val
0 0 0 10 v0 28.0
1 0 1 10 v1 7.0
6 1 2 12 v6 31.0
7 1 3 14 v7 5.0
9 2 1 12 v9 22.0
11 2 3 12 v11 20.0

Expressions using subscripts had to remember what the subscripts meant. However, the combination of PuLP and pandas makes the mathematical model easier to understand, as shown below.

--You can use column names such as "warehouse" instead of just "i". --You can construct mathematical formulas using conditional expressions in pandas. (Reference Solving the N Queen problem with combinatorial optimization) --You can use convenient functions of pandas (groupby, etc.).

Reference site

--Qiita article -Use combinatorial optimization -Sum of variables in mathematical model -Power of combinatorial optimization solver -Investigate duality issues -Mathematical optimization for free work with Python + PuLP -Mathematical Optimization Modeler (PuLP) Cheat Sheet (Python) --Example of combination of PuLP and pandas -[Visualization of data by prefecture (4 color problem)](http://qiita.com/Tsutomu-KKE@github/items/6d17889ba47357e44131#4%E8%89%B2%E5%95%8F%E9%A1% 8C) -Let's decide the date course by combinatorial optimization -Solving N Queen problem with combinatorial optimization -Let's find out the schedule of the tournament that makes you sick -Creating a conference program with combinatorial optimization -Maximize restaurant sales by combinatorial optimization -Optimize road installation -Sudoku is combined and solved optimally -Consider menus by combinatorial optimization --Articles other than Qiita -Combinatorial optimization (Professor Matsui) (PDF 2 pages) -Expected solution to large-scale combinatorial optimization problem (Professor Umetani) (PDF page 51) --Books -"Can be used from today! Combinatorial optimization" -"Business Analytics in Python Language" -"Aspects of Modeling (Series: Optimized Modeling)" --Solver related -PuLP documentation -Memo of Integer Programming (Miyashiro-sensei) -Introduction to Formulation by Integer Programming -Introduction to Integer Programming Solver -Mathematical optimization by ZIMPL language and SCIP - Gurobi Optimizer

that's all

Recommended Posts

Python in optimization
Solve optimization problems in Python
Quadtree in Python --2
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
FX Systre Parameter Optimization in Python
Bayesian optimization package GPyOpt in Python
Epoch in Python
Discord in Python
Sudoku in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python
Chemistry in Python
Hashable in python
DirectLiNGAM in Python
LiNGAM in Python
Flatten in python
flatten in python
I tried using Bayesian Optimization in Python
Sorted list in Python
Daily AtCoder # 36 in Python
Clustering text in Python
Daily AtCoder # 2 in Python
Implement Enigma in python
Daily AtCoder # 32 in Python
Daily AtCoder # 6 in Python
Daily AtCoder # 18 in Python
Edit fonts in Python
Singleton pattern in Python
File operations in Python
Read DXF in python
Daily AtCoder # 53 in Python
Key input in Python