[PYTHON] Try running TensorFlow's LeNet-5 MNIST, AlexNet MNIST on AWS EC2 t2.micro (for free tier)

Introduction

TensorFlow is a deep learning system developed by Google and is released under the Apache 2.0 license. It supports GPU, C ++, and Python.

"I used the GPU to move it crunchy!"

Since there are many posts such as, I dared to install TensorFlow (Python version, Anaconda) on AWS EC2 t2.micro (for free usage tier) and execute it. I hope it helps you understand the CPU credits for your AWS EC2 T2 instance.

LeNet-5, MNIST This time, the sample program tensorflow/models/image/mnist/convolutional.py To work. The model LeNet-5 looks like this. (Source) le_net.png

MNIST is a dataset of handwritten numbers 0-9. (MNIST DATABASE) There are 60,000 training data and 10,000 evaluation data. MNIST.png

Both LeNet-5 and MNIST are published in this paper.

[LeCun et al., 1998] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, november 1998

Using this TensorFlow sample (convolutional.py), we can achieve a correct answer rate of about 99.2%.

Install TensorFlow on AWS EC2

  1. WS EC2 Ubuntu Server 16.04 LTS (HVM), SSD Volume Type ⇒ t2.micro Choose.

2. Putty settings (for Windows people)

  1. Anaconda + TensorFlow Install TensorFlow on Anaconda in case you want to use Python later.

After installation, log out and log in to recognize your PATH. Check the operation of Anaconda.

```
$ conda -V
conda 4.2.9
```

Install TensorFlow in Anaconda's tensorflow environment.

```
$ source activate tensorflow
(tensorflow)$ conda install -c conda-forge tensorflow

The following NEW packages will be INSTALLED:

    mkl:        11.3.3-0
    mock:       2.0.0-py35_0   conda-forge
    numpy:      1.11.2-py35_0
    pbr:        1.10.0-py35_0  conda-forge
    protobuf:   3.0.0b2-py35_0 conda-forge
    six:        1.10.0-py35_0  conda-forge
    tensorflow: 0.10.0-py35_0  conda-forge

Proceed ([y]/n)? y
```

By the way, here is the command to terminate Anaconda's TensorFlow environment.

```
(tensorflow)$ source deactivate
```

Click here for the command to start Anaconda's TensorFlow environment from the next time.

```
(tensorflow)$ source activate tensorflow
```

If you do not switch to the TensorFlow environment, the following error will occur when executing TensorFlow.

```
>>> import tensorflow as tf
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
 ImportError: No module named 'tensorflow'
```

Check the directory where TensorFlow is installed.

```
(tensorflow)$ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow
```

Now you are ready to go.

Run LeNet-5 MNIST on AWS EC2 t2.micro

Let's run it.

(tensorflow) $ python /home/ubuntu/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/models/image/mnist/convolutional.py
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
Initialized!
Step 0 (epoch 0.00), 6.8 ms
Minibatch loss: 12.053, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 432.6 ms
Minibatch loss: 3.276, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.2%
Step 200 (epoch 0.23), 435.2 ms
Minibatch loss: 3.457, learning rate: 0.010000
Minibatch error: 14.1%
Validation error: 3.9%
Step 300 (epoch 0.35), 430.3 ms
Minibatch loss: 3.204, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 3.1%
Step 400 (epoch 0.47), 431.9 ms
Minibatch loss: 3.211, learning rate: 0.010000
Minibatch error: 9.4%
Validation error: 2.5%

Learning will proceed smoothly. The status is displayed every 100 steps.

item meaning
Step Number of learning
epoch Number of times using all training data
ms Average time taken for one study
Minibatch loss Number of training data thinned out
learning rate Parameters of how carefully you proceed with learning
Minibatch error Error rate of training data
Validation error Validation data error rate

Ultimately, the goal is to reduce the validation error.

The parameters of the program are as follows.

item Setting
Exit conditions epoch>10
Batch size 64
Activation function ReLU

Learning will proceed smoothly for a while, but the learning speed will slow down to 1/10 on the way.

Step 5000 (epoch 5.82), 434.0 ms
Step 5100 (epoch 5.93), 431.1 ms
Step 5200 (epoch 6.05), 430.0 ms
Step 5300 (epoch 6.17), 434.3 ms
Step 5400 (epoch 6.28), 533.1 ms
Step 5500 (epoch 6.40), 581.7 ms
Step 5600 (epoch 6.52), 581.4 ms
Step 5700 (epoch 6.63), 580.6 ms
Step 5800 (epoch 6.75), 582.4 ms
Step 5900 (epoch 6.87), 785.4 ms
Step 6000 (epoch 6.98), 975.2 ms
Step 6100 (epoch 7.10), 969.0 ms
Step 6200 (epoch 7.21), 2485.7 ms
Step 6300 (epoch 7.33), 4477.5 ms
Step 6400 (epoch 7.45), 4492.2 ms
Step 6500 (epoch 7.56), 3791.0 ms
Step 6600 (epoch 7.68), 4414.7 ms
Step 6700 (epoch 7.80), 4485.0 ms
Step 6800 (epoch 7.91), 4259.3 ms
Step 6900 (epoch 8.03), 3942.3 ms

Looking at the monitoring, the CPU usage was 100% at the beginning, but it is limited to 10% in the middle. SlowRun_CPU使用率.png

There is plenty of memory on the t2.micro instance.

$ free -h
              total        used        free      shared  buff/cache   available
Mem:           990M        374M        372M        4.4M        243M        574M
Swap:            0B          0B          0B

CPU credit

CPU credit restrictions

If you check the CPU credit balance, you can see that the 30 counts that were initially charged have been used up.

** CPU credit balance ** SlowRun_CPUクレジット残高.png

** CPU credit usage ** SlowRun_CPUクレジット使用状況.png

So what does the CPU credit value mean? In fact, this value represents a savings in how many minutes you can use your CPU 100%. So, while the CPU credits are being charged, if you look at the graph of CPU credit usage, you are consuming 5 counts every 5 minutes.

The CPU used in t2.micro is "Intel (R) Xeon (R) CPU E5-2676 v3 @ 2.40GHz".

$ cat /proc/cpuinfo | grep "model name"
model name      : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

And t2.micro has the right to take advantage of 10% of CPU performance. As for how 10% is allocated, 6 counts of CPU credits are allocated per hour. Since 1 count gives you the right to use 100% of the CPU for 1 minute, 6 counts in 1 hour (60 minutes) is exactly 10%.

You can save up to 24 hours of CPU credit balance. With t2.micro, you can store up to 144 counts (6 counts / hour x 24 hours). Immediately after creating the t2.micro instance, about 30 counts are allocated. If you stop the instance, the CPU credit will be cleared to 0.

It takes about 60 minutes to run TensorFlow's LeNet-5 MINIST on t2.micro and continue to use 100% of the CPU. If you create an instance and run TensorFlow's LeNet-5 MINIST as it is, you will be out of CPU credits for 30 counts. In t2.micro, which has been stopped and the CPU credits have been cleared to 0, 60 counts of CPU credits are insufficient.

Immediately after creating an instance, it takes about 6 hours to wait 5 hours for 30 counts to accumulate even if 30 counts are insufficient, or to keep running slowly with 30 counts insufficient. There is no difference in the end time and instance startup time.

Consider how to use CPU credits efficiently

Considering the CPU credit restrictions, let's think about how to use the CPU efficiently.

  1. ** Create a new instance ** If you just want to earn CPU credits, you can back up to an AMI and recreate an instance based on the AMI every time, and you'll get 30 counts at the start.

  2. ** Use slowdown time (15 minutes) ** As a way to earn CPU counts without changing the program too much, you can use slowing down over 15 minutes. If the CPU credit becomes 0 when the CPU usage rate is 100%, the CPU usage rate will be limited to 10% over 15 minutes. When the CPU credit becomes 0, if the program is stopped once until the CPU credit is given to some extent, the CPU performance of 10% or more can be used for 15 minutes with the CPU credit being 0. So, the calculation will proceed a little. However, this method has a back side. First of all, if the period of 100% CPU utilization is short, it will slow down from 100% to 10% in a short time. Also, the CPU usage rate does not suddenly go from 0% to 100% even with CPU credits, and it takes about 6 minutes to go from 0% to 100%. The CPU utilization limit takes 15 minutes to go from a steady 100% to 10%, but it only takes 5 minutes to go from an instantaneous 40% to 10%. In order for the CPU utilization to reach 100%, it is necessary to accumulate CPU credits for 6 minutes, and it is necessary to stop the calculation for 60 minutes. If you continue the calculation at low speed for 60 minutes, the calculation for 6 minutes will proceed in terms of 100% CPU, but if you stop the CPU, the calculation for this amount will not proceed. The method of repeating the stop and start of the calculation is "** not very effective **" because the slowdown speed of the CPU utilization is fast and it takes a certain amount of time to start the CPU utilization. It seems. Try adding the following sleep processing.

        start_time = time.time()
        ...
        cpu_start_time = time.time() #★ Start time
        for step in xrange(int(num_epochs * train_size) // BATCH_SIZE):
            ...
            if step % EVAL_FREQUENCY == 0:
                elapsed_time = time.time() - start_time
                avrg_time = 1000 * elapsed_time / EVAL_FREQUENCY
                ...
                #★ Sleep for 50 minutes when the average calculation time exceeds 3000 msec
                if avrg_time > 3000:
                    print("sleep t2.micro cpu. passed time=%d" % (time.time() - cpu_start_time))
                    time.sleep(3000)
                    print("run t2.micro cpu. passed time=%d" % (time.time() - cpu_start_time))
                
                start_time = time.time()
    

The graph looks like this. Instead of calculating 10% at a time, 100% is calculated instantaneously.

** CPU usage ** Sleep_CPU Usage.png

** CPU credit usage ** Sleep_CPU Credit Usage.png

** CPU credit balance ** Sleep_CPU Credit Balance.png

  1. ** Relay multiple instances ** The system will be complicated, but it seems to be effective to use the alarm function of AWS to switch while generating / deleting tc2.micro and save the progress in S3. However, it seems that some people are saying that it is better to pay for a high-performance instance to rent a system like that.
  2. ** Create an instance in advance ** If you start t2.micro for 24 hours and do nothing, CPU credits will be accumulated up to 144 counts. If the calculation fits within 144 minutes, you can create an instance the day before and leave it alone.
  3. AWS Lambda Requests need to be kept within 300 seconds, but if this works, it's likely to be an option. If you run Lambda on the same CPU as EC2 t2.micro, it seems that you can use it if you can calculate by dividing it every 500 steps. The calculation time is about 230 seconds, 60 seconds for loading and saving the progress of the calculation, and 10 seconds for the spare. This calculation is OK if you have 512MB of memory, so you can calculate continuously for 9 days with Lambda's free frame. However, Lambda's pricing is different from EC2. In the case of Lambda, the more memory you have, the faster your CPU will be. I want to reduce the number of switches every 300 seconds as much as possible. Then, using the upper limit of 1536MB of memory will reduce the number of switches. If you use 1536MB of memory, the free tier for one month will be 3 days (266,667 seconds). Click here for the AWS Lambda machine spec survey results (link).

AlexNet MNIST Next, run the AlexNet MNIST benchmark on EC2 t2.micro (for free tier) and compare it to the GPU. The program is as follows under the installation directory of tensorflow.

tensorflow/models/image/alexnet/alexnet_benchmark.py

According to the program's comments, GPUs seem to have this kind of performance.

Forward pass:
Run on Tesla K40c: 145 +/- 1.5 ms / batch
Run on Titan X:     70 +/- 0.1 ms / batch

Forward-backward pass:
Run on Tesla K40c: 480 +/- 48 ms / batch
Run on Titan X:    244 +/- 30 ms / batch

On the other hand, the console log that ran EC2 t2.micro looks like this. Even if EC2 t2.micro is running on 100% CPU, there is a difference of about 100 times between GPU and EC2 t2.micro.

conv1   [128, 56, 56, 64]
pool1   [128, 27, 27, 64]
conv2   [128, 27, 27, 192]
pool2   [128, 13, 13, 192]
conv3   [128, 13, 13, 384]
conv4   [128, 13, 13, 256]
conv5   [128, 13, 13, 256]
pool5   [128, 6, 6, 256]
2016-10-24 16:18:03.743222: step 10, duration = 9.735
2016-10-24 16:19:40.927811: step 20, duration = 9.675
2016-10-24 16:21:17.593104: step 30, duration = 9.664
2016-10-24 16:22:53.894240: step 40, duration = 9.684
2016-10-24 16:24:29.968737: step 50, duration = 9.597
2016-10-24 16:26:06.527066: step 60, duration = 9.686
2016-10-24 16:27:43.229298: step 70, duration = 9.689
2016-10-24 16:29:19.643403: step 80, duration = 9.679
2016-10-24 16:30:56.202710: step 90, duration = 9.588
2016-10-24 16:32:22.877673: Forward across 100 steps, 9.553 +/- 0.962 sec / batch
2016-10-24 16:42:27.229588: step 10, duration = 28.700
2016-10-24 16:49:33.216683: step 20, duration = 72.885
...

Then, around step 20 of Forward-backward, CPU credit 30 is used up. After that, it gradually slows down, it operates at 10% CPU, and the calculation continues endlessly at 1/1000 speed of GPU. Hmmm. It won't be a match at all.

in conclusion

Let's change the viewpoint a little.

How can I get a sense of accomplishment by running TensorFlow's LeNet-5 MNIST (convolutional.py) on AWS EC2 t2.micro (free tier)?

Immediately after creating an instance, there are 30 CPU credits, so let's set it so that the calculation will be completed in 30 to 40 minutes.

Modified convolutional.py


NUM_EPOCHS = 6

That's right. There is only one change. It is OK if you set the number of learning to be within the CPU credit.

Let's have a fun Deep Learning life with TensorFlow! !!

Recommended Posts

Try running TensorFlow's LeNet-5 MNIST, AlexNet MNIST on AWS EC2 t2.micro (for free tier)
Install AWS SDK for PHP on AWS EC2 (PHP7.2 + Apache2.4.41 + OPCashe + Composer)
Try automating Start / Stop for EC2 instances with AWS Lambda