[PYTHON] Use the bin packing problem to reduce cloud usage fees

Optimization of cloud usage using bin packing problem

Now, when using clouds such as AWS, Azure, and GCP, you may be wondering which application should be installed in a virtual machine of what size for efficiency. Given that the CPU, memory, and disk usage of the application are known, what size virtual machine should the application be placed in, whether it should be collocated, distributed, and so on. There is. It seems that engineers with a long history of using the cloud will know which size to choose based on experience. However, this time I changed my approach a little and wondered if I could find a solution as an optimization problem.

For example, in the following situations, what size application should be put in what size virtual machine to be efficient?

1.png

First of all

The optimization problem seemed to be interesting, so after studying, I thought about a problem that was familiar to me. Since the optimization problem history is one week, please point out any mistakes or advice.

The following books were used for studying. [Optimization / Statistical Analysis / Machine Learning for Business Analytics Practitioners in Python Language](https://www.amazon.co.jp/Python%E8%A8%80%E8%AA%9E%E3%81] % AB% E3% 82% 88% E3% 82% 8B% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 82% A2% E3% 83% 8A % E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9-% E5% AE% 9F% E5% 8B% 99% E5% AE% B6% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E6% 9C% 80% E9% 81% A9% E5% 8C% 96% E3% 83% BB% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 83% BB% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E4% B9% 85% E4% BF% 9D-% E5% B9% B9% E9% 9B% 84 / dp / 4764905167 / ref = sr_1_1? Ie = UTF8 & qid = 1495359888 & sr = 8-1 & keywords = python +% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 83% BB% E3% 82% A2% E3% 83% 8A% E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9)

Problem setting

This time, as a result of listing the required number of applications on the virtual machine in the cloud, I would like to find the configuration with the lowest cost as a bin packing problem.

The bin-packing problem is a combination optimization problem that finds the minimum number of containers required when packing a package (box, bottle, container) with luggage (weight and number are fixed). For example, when you move, you might pack your luggage in cardboard boxes, but you need to solve the packing method to minimize the number of cardboard boxes.

2.png

If you want to pack your luggage in cardboard, consider the volume and weight of the luggage for the volume (width x length x height) and load capacity of the cardboard box. When I saw this, I felt that I could optimize the use of virtual machines in the cloud by replacing luggage with applications, cardboard as virtual machines, and volume and weight with CPU, RAM, Disk, and costs. This is the beginning.

environment

I would like to solve an optimization problem in Python. Python provides a variety of useful libraries for solving optimization problems, which are explained in detail here. Python in optimization

This time I used openopt with Python3.5 (Anaconda). The OS is CentOS 7.3. Openopt is a library that creates models for mathematical optimization.

Follow the steps below to install openopt in this environment.

conda install --channel https://conda.anaconda.org/cachemeorg funcdesigner openopt
pip install cvxopt
pip install glpk

things to do

This time, we will divide the application into three types. Small applications, medium applications, large applications. The usage of each resource is as follows.

Small application Medium application Large application
CPU: 0.2 CPU: 0.5 CPU: 2.4
RAM: 256MB RAM: 512MB RAM: 2048MB
DISK: 1GB DISK: 10GB DISK: 40GB

Use the bin-packing problem to solve which of the following EC2 instance sizes is the cheapest to pack.

M4.4xlarge R3.2xlarge C4.2xlarge
CPU: 16vCPU CPU: 8vCPU CPU: 8vCPU
RAM: 64GB RAM: 61GB RAM: 15GB
Disk: 100GB Disk: 100GB Disk: 100GB
$1.032 / hour $0.798 / hour $0.504 / hour

The unit price is the price when using a Linux on-demand instance in the Tokyo region as of today (May 21, 2017). Reference Also, the cost of the disc (EBS) is not included.

program

The full program is here.

# import openopt
from openopt import *

#Set the number of small, medium, and large applications.
small_num = 20
med_num = 12
large_num = 9

apps = []

#Make the resource usage of each application dict and add it to the list.
for i in range(small_num):
    small_app = {
        'name': 'small%d' % i,
        'cpu': 0.2,
        'mem': 256,
        'disk': 1
        }
    apps.append(small_app)

for i in range(med_num):
    med_app = {
        'name': 'medium%d' % i,
        'cpu': 0.5,
        'mem': 512,
        'disk': 10
        }
    apps.append(med_app)
    
for i in range(large_num):
    large_app = {
        'name': 'large%d' % i,
        'cpu': 2.4,
        'mem': 2048,
        'disk': 40
        }
    apps.append(large_app)


#Sets the size of your AWS EC2 instance.
#Each resource is multiplied by 9 because the OS is 10 of the resources%This is because it is assumed that
instance_sizes = [
    {
        'name': 'm4.x4large',
        'cost': 1.032 * 24 * 30,
        'size': {
            'cpu': 16 * 0.9,
            'mem': 64 * 1024 * 0.9, 
            'disk': 1000 * 0.9
        }
    },
    {
        'name': 'r3.2xlarge',
        'cost': 0.798 * 24 * 30,
        'size': {
            'cpu': 8 * 0.9,
            'mem': 61 * 1024 * 0.9, 
            'disk': 1000 * 0.9
        }
    },
    {
        'name': 'c4.2xlarge',
        'cost': 0.504 * 24 * 30,
        'size': {
            'cpu': 8 * 0.9,
            'mem': 15 * 1024 * 0.9, 
            'disk': 1000 * 0.9
        }
    }
]

#It is bin packing.
#We use a function called BPP from openopt.
def bin_pack_instance(apps, instance_size):
    cost = instance_size['cost']    
    p = BPP(apps, instance_size['size'], goal = 'min')
    r = p.solve('glpk', iprint = 0)
    instances = len(r.xf)
    total_cost = instances * cost
    return r, instances, total_cost

#I will do it.
#Bin packing for each instance size and find the cheapest one.
if __name__ == '__main__':
    list_cost = []
    for instance in instance_sizes:
        r, instances, total_cost = bin_pack_instance(apps, instance)
        list_cost.append({'instance': instance['name'], 'total_cost': total_cost})

        print("\r") 
        print("Bin packing for : {0}".format(instance['name']))
        print("Total number of apps is " + str(len(apps)))
        print("Total {0} instance used is {1}".format(instance['name'], instances))
        print("Total cost is {0}".format(total_cost))

        for i,s in enumerate(r.xf):
            print ("Instance {0} contains {1} apps".format(i, len(s)))
            print("\t CPU: {0}vCPU\t RAM: {1}MB\t Disk: {2}GB"
                  .format(r.values['cpu'][i], r.values['mem'][i], r.values['disk'][i]))
            print("\t Contains: {0}".format(r.xf[i]))

        print("\r")  
    
    print("Result: {0}".format(list_cost))

The result is as follows.

------------------------- OpenOpt 0.5625 -------------------------
problem: unnamed   type: MILP    goal: min
solver: glpk
  iter  objFunVal  log10(maxResidual)  
    0  0.000e+00               0.00 
    1  0.000e+00            -100.00 
istop: 1000 (optimal)
Solver:   Time Elapsed = 0.12 	CPU Time Elapsed = 0.12
objFuncValue: 3 (feasible, MaxResidual = 0)

Bin packing for : m4.x4large
Total number of apps is 41
Total m4.x4large instance used is 3
Total cost is 2229.12
Instance 0 contains 18 apps
	 CPU: 14.200000000000001vCPU	 RAM: 13312.0MB	 Disk: 228.0GB
	 Contains: ('small0', 'small3', 'small4', 'small5', 'small6', 'small7', 'small8', 'small13', 'medium0', 'medium1', 'medium2', 'medium3', 'medium4', 'medium5', 'large3', 'large4', 'large6', 'large7')
Instance 1 contains 17 apps
	 CPU: 14.4vCPU	 RAM: 13312.0MB	 Disk: 212.0GB
	 Contains: ('small1', 'small2', 'small9', 'small10', 'small11', 'small12', 'small14', 'small15', 'small16', 'small17', 'small18', 'small19', 'large0', 'large1', 'large2', 'large5', 'large8')
Instance 2 contains 6 apps
	 CPU: 3.0vCPU	 RAM: 3072.0MB	 Disk: 60.0GB
	 Contains: ('medium6', 'medium7', 'medium8', 'medium9', 'medium10', 'medium11')


------------------------- OpenOpt 0.5625 -------------------------
problem: unnamed   type: MILP    goal: min
solver: glpk
  iter  objFunVal  log10(maxResidual)  
    0  0.000e+00               0.00 
    1  0.000e+00            -100.00 
istop: 1000 (optimal)
Solver:   Time Elapsed = 0.24 	CPU Time Elapsed = 0.23
objFuncValue: 5 (feasible, MaxResidual = 0)

Bin packing for : r3.2xlarge
Total number of apps is 41
Total r3.2xlarge instance used is 5
Total cost is 2872.8
Instance 0 contains 3 apps
	 CPU: 7.199999999999999vCPU	 RAM: 6144.0MB	 Disk: 120.0GB
	 Contains: ('large0', 'large4', 'large7')
Instance 1 contains 11 apps
	 CPU: 7.199999999999999vCPU	 RAM: 6912.0MB	 Disk: 107.0GB
	 Contains: ('small5', 'small6', 'small7', 'small8', 'small9', 'small18', 'small19', 'medium0', 'medium1', 'large1', 'large2')
Instance 2 contains 13 apps
	 CPU: 7.0vCPU	 RAM: 6912.0MB	 Disk: 91.0GB
	 Contains: ('small0', 'small1', 'small2', 'small10', 'small11', 'small12', 'small13', 'small14', 'small15', 'small16', 'small17', 'large5', 'large6')
Instance 3 contains 8 apps
	 CPU: 7.199999999999999vCPU	 RAM: 6656.0MB	 Disk: 122.0GB
	 Contains: ('small3', 'small4', 'medium2', 'medium3', 'medium4', 'medium5', 'large3', 'large8')
Instance 4 contains 6 apps
	 CPU: 3.0vCPU	 RAM: 3072.0MB	 Disk: 60.0GB
	 Contains: ('medium6', 'medium7', 'medium8', 'medium9', 'medium10', 'medium11')


------------------------- OpenOpt 0.5625 -------------------------
problem: unnamed   type: MILP    goal: min
solver: glpk
  iter  objFunVal  log10(maxResidual)  
    0  0.000e+00               0.00 
    1  0.000e+00            -100.00 
istop: 1000 (optimal)
Solver:   Time Elapsed = 0.14 	CPU Time Elapsed = 0.14
objFuncValue: 5 (feasible, MaxResidual = 0)

Bin packing for : c4.2xlarge
Total number of apps is 41
Total c4.2xlarge instance used is 5
Total cost is 1814.4
Instance 0 contains 7 apps
	 CPU: 5.4vCPU	 RAM: 5120.0MB	 Disk: 100.0GB
	 Contains: ('medium0', 'medium1', 'medium2', 'medium3', 'medium4', 'medium5', 'large6')
Instance 1 contains 15 apps
	 CPU: 7.0vCPU	 RAM: 7168.0MB	 Disk: 108.0GB
	 Contains: ('small8', 'small9', 'small10', 'small14', 'small16', 'small17', 'small18', 'small19', 'medium6', 'medium7', 'medium8', 'medium9', 'medium10', 'medium11', 'large0')
Instance 2 contains 14 apps
	 CPU: 7.199999999999999vCPU	 RAM: 7168.0MB	 Disk: 92.0GB
	 Contains: ('small0', 'small1', 'small2', 'small3', 'small4', 'small5', 'small6', 'small7', 'small11', 'small12', 'small13', 'small15', 'large3', 'large4')
Instance 3 contains 3 apps
	 CPU: 7.199999999999999vCPU	 RAM: 6144.0MB	 Disk: 120.0GB
	 Contains: ('large1', 'large2', 'large5')
Instance 4 contains 2 apps
	 CPU: 4.8vCPU	 RAM: 4096.0MB	 Disk: 80.0GB
	 Contains: ('large7', 'large8')

Result: [{'instance': 'm4.x4large', 'total_cost': 2229.12}, {'instance': 'r3.2xlarge', 'total_cost': 2872.8}, {'instance': 'c4.2xlarge', 'total_cost': 1814.4}]


It will be longer, but as a result, it seems most efficient to pack c4.2xlarge into 4 units, which costs $ 1814.4 per month.

Impressions

This time, I considered application placement as an optimization problem. Of course, there are few cases where all applications are packed in an instance of the same size, and there are many points to consider when considering the architecture, such as subnet separation. Actually, I wanted to calculate a configuration that mixed multiple instance sizes (c4.2xlarge 3 units, t2.micro 4 units, etc.), but I do not know how to pack in multiple bins of different sizes, so this shape became. This will be an issue for the future. If you have any details, please let me know.

reference

Use combinatorial optimization Combinatorial optimization-standard problems and execution method Python in optimization How to solve the bin packing problem [Optimization / Statistical Analysis / Machine Learning for Business Analytics Practitioners in Python Language](https://www.amazon.co.jp/Python%E8%A8%80%E8%AA%9E%E3%81] % AB% E3% 82% 88% E3% 82% 8B% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 82% A2% E3% 83% 8A % E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9-% E5% AE% 9F% E5% 8B% 99% E5% AE% B6% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E6% 9C% 80% E9% 81% A9% E5% 8C% 96% E3% 83% BB% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 83% BB% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E4% B9% 85% E4% BF% 9D-% E5% B9% B9% E9% 9B% 84 / dp / 4764905167 / ref = sr_1_1? Ie = UTF8 & qid = 1495359888 & sr = 8-1 & keywords = python +% E3% 83% 93% E3% 82% B8% E3% 83% 8D% E3% 82% B9% E3% 83% BB% E3% 82% A2% E3% 83% 8A% E3% 83% AA% E3% 83% 86% E3% 82% A3% E3% 82% AF% E3% 82% B9) https://github.com/PythonCharmers/OOSuite/blob/master/OpenOpt/openopt/examples/bpp_2.py

Recommended Posts

Use the bin packing problem to reduce cloud usage fees
How to solve the bin packing problem
Consideration of approach to bin packing problem
How to use the Google Cloud Translation API
How to use the generator
How to use the decorator
How to use the zip function
How to use the optparse module
How to use the ConfigParser module
Use Raspberry Pi to solve the problem of insufficient mobile Wi-Fi connection
How to use the Spark ML pipeline
[Linux] How to use the echo command
How to use the Linux grep command
Use numpy's .flatten () [0] to retrieve the value
How to use the IPython debugger (ipdb)
How to use GCP's Cloud Vision API
3 best ways to use the less command
Use linter to reduce code review costs