Now, when using clouds such as AWS, Azure, and GCP, you may be wondering which application should be installed in a virtual machine of what size for efficiency. Given that the CPU, memory, and disk usage of the application are known, what size virtual machine should the application be placed in, whether it should be collocated, distributed, and so on. There is. It seems that engineers with a long history of using the cloud will know which size to choose based on experience. However, this time I changed my approach a little and wondered if I could find a solution as an optimization problem.
For example, in the following situations, what size application should be put in what size virtual machine to be efficient?
The optimization problem seemed to be interesting, so after studying, I thought about a problem that was familiar to me. Since the optimization problem history is one week, please point out any mistakes or advice.
The following books were used for studying. [Optimization / Statistical Analysis / Machine Learning for Business Analytics Practitioners in Python Language](https://www.amazon.co.jp/Python%E8%A8%80%E8%AA%9E%E3%81] % AB% E3% 82% 88% E3% 82% 8B% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 82% A2% E3% 83% 8A % E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9-% E5% AE% 9F% E5% 8B% 99% E5% AE% B6% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E6% 9C% 80% E9% 81% A9% E5% 8C% 96% E3% 83% BB% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 83% BB% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E4% B9% 85% E4% BF% 9D-% E5% B9% B9% E9% 9B% 84 / dp / 4764905167 / ref = sr_1_1? Ie = UTF8 & qid = 1495359888 & sr = 8-1 & keywords = python +% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 83% BB% E3% 82% A2% E3% 83% 8A% E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9)
This time, as a result of listing the required number of applications on the virtual machine in the cloud, I would like to find the configuration with the lowest cost as a bin packing problem.
The bin-packing problem is a combination optimization problem that finds the minimum number of containers required when packing a package (box, bottle, container) with luggage (weight and number are fixed). For example, when you move, you might pack your luggage in cardboard boxes, but you need to solve the packing method to minimize the number of cardboard boxes.
If you want to pack your luggage in cardboard, consider the volume and weight of the luggage for the volume (width x length x height) and load capacity of the cardboard box. When I saw this, I felt that I could optimize the use of virtual machines in the cloud by replacing luggage with applications, cardboard as virtual machines, and volume and weight with CPU, RAM, Disk, and costs. This is the beginning.
I would like to solve an optimization problem in Python. Python provides a variety of useful libraries for solving optimization problems, which are explained in detail here. Python in optimization
This time I used openopt with Python3.5 (Anaconda). The OS is CentOS 7.3. Openopt is a library that creates models for mathematical optimization.
Follow the steps below to install openopt in this environment.
conda install --channel https://conda.anaconda.org/cachemeorg funcdesigner openopt
pip install cvxopt
pip install glpk
This time, we will divide the application into three types. Small applications, medium applications, large applications. The usage of each resource is as follows.
Small application | Medium application | Large application |
---|---|---|
CPU: 0.2 | CPU: 0.5 | CPU: 2.4 |
RAM: 256MB | RAM: 512MB | RAM: 2048MB |
DISK: 1GB | DISK: 10GB | DISK: 40GB |
Use the bin-packing problem to solve which of the following EC2 instance sizes is the cheapest to pack.
M4.4xlarge | R3.2xlarge | C4.2xlarge |
---|---|---|
CPU: 16vCPU | CPU: 8vCPU | CPU: 8vCPU |
RAM: 64GB | RAM: 61GB | RAM: 15GB |
Disk: 100GB | Disk: 100GB | Disk: 100GB |
$1.032 / hour | $0.798 / hour | $0.504 / hour |
The unit price is the price when using a Linux on-demand instance in the Tokyo region as of today (May 21, 2017). Reference Also, the cost of the disc (EBS) is not included.
The full program is here.
# import openopt
from openopt import *
#Set the number of small, medium, and large applications.
small_num = 20
med_num = 12
large_num = 9
apps = []
#Make the resource usage of each application dict and add it to the list.
for i in range(small_num):
small_app = {
'name': 'small%d' % i,
'cpu': 0.2,
'mem': 256,
'disk': 1
}
apps.append(small_app)
for i in range(med_num):
med_app = {
'name': 'medium%d' % i,
'cpu': 0.5,
'mem': 512,
'disk': 10
}
apps.append(med_app)
for i in range(large_num):
large_app = {
'name': 'large%d' % i,
'cpu': 2.4,
'mem': 2048,
'disk': 40
}
apps.append(large_app)
#Sets the size of your AWS EC2 instance.
#Each resource is multiplied by 9 because the OS is 10 of the resources%This is because it is assumed that
instance_sizes = [
{
'name': 'm4.x4large',
'cost': 1.032 * 24 * 30,
'size': {
'cpu': 16 * 0.9,
'mem': 64 * 1024 * 0.9,
'disk': 1000 * 0.9
}
},
{
'name': 'r3.2xlarge',
'cost': 0.798 * 24 * 30,
'size': {
'cpu': 8 * 0.9,
'mem': 61 * 1024 * 0.9,
'disk': 1000 * 0.9
}
},
{
'name': 'c4.2xlarge',
'cost': 0.504 * 24 * 30,
'size': {
'cpu': 8 * 0.9,
'mem': 15 * 1024 * 0.9,
'disk': 1000 * 0.9
}
}
]
#It is bin packing.
#We use a function called BPP from openopt.
def bin_pack_instance(apps, instance_size):
cost = instance_size['cost']
p = BPP(apps, instance_size['size'], goal = 'min')
r = p.solve('glpk', iprint = 0)
instances = len(r.xf)
total_cost = instances * cost
return r, instances, total_cost
#I will do it.
#Bin packing for each instance size and find the cheapest one.
if __name__ == '__main__':
list_cost = []
for instance in instance_sizes:
r, instances, total_cost = bin_pack_instance(apps, instance)
list_cost.append({'instance': instance['name'], 'total_cost': total_cost})
print("\r")
print("Bin packing for : {0}".format(instance['name']))
print("Total number of apps is " + str(len(apps)))
print("Total {0} instance used is {1}".format(instance['name'], instances))
print("Total cost is {0}".format(total_cost))
for i,s in enumerate(r.xf):
print ("Instance {0} contains {1} apps".format(i, len(s)))
print("\t CPU: {0}vCPU\t RAM: {1}MB\t Disk: {2}GB"
.format(r.values['cpu'][i], r.values['mem'][i], r.values['disk'][i]))
print("\t Contains: {0}".format(r.xf[i]))
print("\r")
print("Result: {0}".format(list_cost))
The result is as follows.
------------------------- OpenOpt 0.5625 -------------------------
problem: unnamed type: MILP goal: min
solver: glpk
iter objFunVal log10(maxResidual)
0 0.000e+00 0.00
1 0.000e+00 -100.00
istop: 1000 (optimal)
Solver: Time Elapsed = 0.12 CPU Time Elapsed = 0.12
objFuncValue: 3 (feasible, MaxResidual = 0)
Bin packing for : m4.x4large
Total number of apps is 41
Total m4.x4large instance used is 3
Total cost is 2229.12
Instance 0 contains 18 apps
CPU: 14.200000000000001vCPU RAM: 13312.0MB Disk: 228.0GB
Contains: ('small0', 'small3', 'small4', 'small5', 'small6', 'small7', 'small8', 'small13', 'medium0', 'medium1', 'medium2', 'medium3', 'medium4', 'medium5', 'large3', 'large4', 'large6', 'large7')
Instance 1 contains 17 apps
CPU: 14.4vCPU RAM: 13312.0MB Disk: 212.0GB
Contains: ('small1', 'small2', 'small9', 'small10', 'small11', 'small12', 'small14', 'small15', 'small16', 'small17', 'small18', 'small19', 'large0', 'large1', 'large2', 'large5', 'large8')
Instance 2 contains 6 apps
CPU: 3.0vCPU RAM: 3072.0MB Disk: 60.0GB
Contains: ('medium6', 'medium7', 'medium8', 'medium9', 'medium10', 'medium11')
------------------------- OpenOpt 0.5625 -------------------------
problem: unnamed type: MILP goal: min
solver: glpk
iter objFunVal log10(maxResidual)
0 0.000e+00 0.00
1 0.000e+00 -100.00
istop: 1000 (optimal)
Solver: Time Elapsed = 0.24 CPU Time Elapsed = 0.23
objFuncValue: 5 (feasible, MaxResidual = 0)
Bin packing for : r3.2xlarge
Total number of apps is 41
Total r3.2xlarge instance used is 5
Total cost is 2872.8
Instance 0 contains 3 apps
CPU: 7.199999999999999vCPU RAM: 6144.0MB Disk: 120.0GB
Contains: ('large0', 'large4', 'large7')
Instance 1 contains 11 apps
CPU: 7.199999999999999vCPU RAM: 6912.0MB Disk: 107.0GB
Contains: ('small5', 'small6', 'small7', 'small8', 'small9', 'small18', 'small19', 'medium0', 'medium1', 'large1', 'large2')
Instance 2 contains 13 apps
CPU: 7.0vCPU RAM: 6912.0MB Disk: 91.0GB
Contains: ('small0', 'small1', 'small2', 'small10', 'small11', 'small12', 'small13', 'small14', 'small15', 'small16', 'small17', 'large5', 'large6')
Instance 3 contains 8 apps
CPU: 7.199999999999999vCPU RAM: 6656.0MB Disk: 122.0GB
Contains: ('small3', 'small4', 'medium2', 'medium3', 'medium4', 'medium5', 'large3', 'large8')
Instance 4 contains 6 apps
CPU: 3.0vCPU RAM: 3072.0MB Disk: 60.0GB
Contains: ('medium6', 'medium7', 'medium8', 'medium9', 'medium10', 'medium11')
------------------------- OpenOpt 0.5625 -------------------------
problem: unnamed type: MILP goal: min
solver: glpk
iter objFunVal log10(maxResidual)
0 0.000e+00 0.00
1 0.000e+00 -100.00
istop: 1000 (optimal)
Solver: Time Elapsed = 0.14 CPU Time Elapsed = 0.14
objFuncValue: 5 (feasible, MaxResidual = 0)
Bin packing for : c4.2xlarge
Total number of apps is 41
Total c4.2xlarge instance used is 5
Total cost is 1814.4
Instance 0 contains 7 apps
CPU: 5.4vCPU RAM: 5120.0MB Disk: 100.0GB
Contains: ('medium0', 'medium1', 'medium2', 'medium3', 'medium4', 'medium5', 'large6')
Instance 1 contains 15 apps
CPU: 7.0vCPU RAM: 7168.0MB Disk: 108.0GB
Contains: ('small8', 'small9', 'small10', 'small14', 'small16', 'small17', 'small18', 'small19', 'medium6', 'medium7', 'medium8', 'medium9', 'medium10', 'medium11', 'large0')
Instance 2 contains 14 apps
CPU: 7.199999999999999vCPU RAM: 7168.0MB Disk: 92.0GB
Contains: ('small0', 'small1', 'small2', 'small3', 'small4', 'small5', 'small6', 'small7', 'small11', 'small12', 'small13', 'small15', 'large3', 'large4')
Instance 3 contains 3 apps
CPU: 7.199999999999999vCPU RAM: 6144.0MB Disk: 120.0GB
Contains: ('large1', 'large2', 'large5')
Instance 4 contains 2 apps
CPU: 4.8vCPU RAM: 4096.0MB Disk: 80.0GB
Contains: ('large7', 'large8')
Result: [{'instance': 'm4.x4large', 'total_cost': 2229.12}, {'instance': 'r3.2xlarge', 'total_cost': 2872.8}, {'instance': 'c4.2xlarge', 'total_cost': 1814.4}]
It will be longer, but as a result, it seems most efficient to pack c4.2xlarge into 4 units, which costs $ 1814.4 per month.
This time, I considered application placement as an optimization problem. Of course, there are few cases where all applications are packed in an instance of the same size, and there are many points to consider when considering the architecture, such as subnet separation. Actually, I wanted to calculate a configuration that mixed multiple instance sizes (c4.2xlarge 3 units, t2.micro 4 units, etc.), but I do not know how to pack in multiple bins of different sizes, so this shape became. This will be an issue for the future. If you have any details, please let me know.
Use combinatorial optimization Combinatorial optimization-standard problems and execution method Python in optimization How to solve the bin packing problem [Optimization / Statistical Analysis / Machine Learning for Business Analytics Practitioners in Python Language](https://www.amazon.co.jp/Python%E8%A8%80%E8%AA%9E%E3%81] % AB% E3% 82% 88% E3% 82% 8B% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 82% A2% E3% 83% 8A % E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9-% E5% AE% 9F% E5% 8B% 99% E5% AE% B6% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E6% 9C% 80% E9% 81% A9% E5% 8C% 96% E3% 83% BB% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 83% BB% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E4% B9% 85% E4% BF% 9D-% E5% B9% B9% E9% 9B% 84 / dp / 4764905167 / ref = sr_1_1? Ie = UTF8 & qid = 1495359888 & sr = 8-1 & keywords = python +% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 83% BB% E3% 82% A2% E3% 83% 8A% E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9) https://github.com/PythonCharmers/OOSuite/blob/master/OpenOpt/openopt/examples/bpp_2.py
Recommended Posts