Introduction

It is natural that you need huge computational resources to do deep learning, but the specifications of your PC are not enough, so I think that it is also a good idea to quickly set up a GPU instance of EC2.

Framework is Keras and backend is TensorFlow GPU version. It was configured to be set up in the pyenv virtual environment. TensorFlow 1.0 released the other day It seems that tf.keras has been implemented, but I haven't tried it yet, so it's normal I will use Keras.

I referred to here for the environment construction. Run TensorFlow on AWS GPU instance

Create an instance

EC2 Instance: Ubuntu Server 16.04 LTS (HVM), SSD Volume Type in Quick Start Type : g2.2xlarge Standard configuration for storage etc. (memory 15GB, storage 8GB)

setup

First, log in with SSH

ssh -i ~/[ec2key].pem ubuntu@[Instance IP]

Create a symbolic link to make ephemeral storage a working directory

Since it is quite big with CUDA, there is not enough free space in the middle of the work. Create a symbolic link to / mnt / tmp / to use ephemeral storage as your work area.

sudo mkdir /mnt/tmp
sudo chmod 777 /mnt/tmp
sudo rm -rf /tmp
sudo ln -s /mnt/tmp /tmp
cd /tmp

Upgrade Ubuntu to the latest

sudo apt-get update
sudo apt-get upgrade -y

Set the locale

After upgrading, I get annoyed with locale-related warnings.

sudo apt-get install language-pack-ja
sudo update-locale LANG=ja_JP.UTF-8

Install the modules required for setup

sudo apt-get install python
sudo apt-get install -y build-essential python-pip python-dev git python-numpy swig python-dev default-jdk zip zlib1g-dev ipython

Add Nouveau blacklist to avoid conflicts with NVIDIA drivers

echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
sudo reboot

After rebooting, log in and install linux-image-extra-virtual.

ssh -i ~/[ec2key].pem ubuntu@[Instance IP]
sudo apt-get install -y linux-image-extra-virtual
sudo reboot

After rebooting, log in and install linux-headers.

ssh -i ~/[ec2key].pem ubuntu@[Instance IP]
sudo apt-get install -y linux-source linux-headers-`uname -r`

Set up CUDA Toolkit v8

The latest version at the moment is 8.0. Download and install

cd /tmp
wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda_8.0.44_linux-run
chmod +x cuda_8.0.44_linux-run
./cuda_8.0.44_linux-run  -extract=`pwd`/nvidia_installers
cd nvidia_installers/
sudo ./NVIDIA-Linux-x86_64-367.48.run
#Select Accept. Kernel setup has 100 progress bars%It takes a while after becoming
#Select OK
#Select OK
#Select Yes
#Select OK

sudo modprobe nvidia
sudo ./cuda-linux64-rel-8.0.44-21122537.run
#The Readme is displayed, so exit with q.
#Enter accept
# install Path: default
# shortcut?: default

Set up cuDNN

Download cuDNN locally once from https://developer.nvidia.com/cudnn Because cuDNN cannot be dropped without signing up with Developer. it's no use.

Transfer cuDNN downloaded from your local terminal to your EC2 instance

scp -i [ec2key].pem cudnn-8.0-linux-x64-v5.1.tgz ubuntu@[Instance IP]:/tmp

Return to EC2 instance

cd /tmp
tar -xzf cudnn-8.0-linux-x64-v5.1.tgz
sudo mv ./cuda/lib64/* /usr/local/cuda/lib64/
sudo mv ./cuda/include/* /usr/local/cuda/include/

Add the following to ~ / .bashrc

`.bashrc`


# cuDNN
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

source ~/.bashrc

Building a Python virtual environment

python -V
Python 2.7.12

sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
git clone https://github.com/yyuu/pyenv.git ~/.pyenv
git clone https://github.com/yyuu/pyenv-virtualenv.git ~/.pyenv/plugins/pyenv-virtualenv

Create ~ / .bash_profile and write the following

`.bash_profile`


# pyenv
export PYENV_ROOT=$HOME/.pyenv
export PATH=$PYENV_ROOT/bin:$PATH
eval "$(pyenv init -)"
# virtualenv
eval "$(pyenv virtualenv-init -)"
export PYENV_VIRTUALENV_DISABLE_PROMPT=1

source ~/.bash_profile

Create a virtual environment for Keras

pyenv install 3.5.3
pyenv virtualenv 3.5.3 keras
pyenv activate keras
python -V
Python 3.5.3

Install Tensorflow

pip install tensorflow-gpu

Check the version of TensorFlow and if it looks like the following, you can use the GPU.

python -c 'import tensorflow as tf; print(tf.__version__)'
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
1.0.0

Install Keras

pip install pillow
pip install h5py
pip install matplotlib
pip install keras

Try running keras examples

cd /tmp
git clone https://github.com/fchollet/keras.git
cd keras/examples

For the time being, try running the one that solves MNIST with CNN

python mnist_cnn.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
60000/60000 [==============================] - 13s - loss: 0.3770 - acc: 0.8839 - val_loss: 0.0932 - val_acc: 0.9709
Epoch 2/12
60000/60000 [==============================] - 11s - loss: 0.1363 - acc: 0.9603 - val_loss: 0.0632 - val_acc: 0.9801
Epoch 3/12
60000/60000 [==============================] - 11s - loss: 0.1064 - acc: 0.9687 - val_loss: 0.0509 - val_acc: 0.9835
Epoch 4/12
60000/60000 [==============================] - 11s - loss: 0.0900 - acc: 0.9736 - val_loss: 0.0443 - val_acc: 0.9857
Epoch 5/12
60000/60000 [==============================] - 11s - loss: 0.0769 - acc: 0.9775 - val_loss: 0.0405 - val_acc: 0.9865
Epoch 6/12
60000/60000 [==============================] - 11s - loss: 0.0689 - acc: 0.9795 - val_loss: 0.0371 - val_acc: 0.9870
Epoch 7/12
60000/60000 [==============================] - 11s - loss: 0.0649 - acc: 0.9803 - val_loss: 0.0361 - val_acc: 0.9881
Epoch 8/12
60000/60000 [==============================] - 11s - loss: 0.0594 - acc: 0.9823 - val_loss: 0.0356 - val_acc: 0.9886
Epoch 9/12
60000/60000 [==============================] - 11s - loss: 0.0547 - acc: 0.9841 - val_loss: 0.0321 - val_acc: 0.9889
Epoch 10/12
60000/60000 [==============================] - 11s - loss: 0.0525 - acc: 0.9841 - val_loss: 0.0320 - val_acc: 0.9889
Epoch 11/12
60000/60000 [==============================] - 11s - loss: 0.0506 - acc: 0.9850 - val_loss: 0.0323 - val_acc: 0.9892
Epoch 12/12
60000/60000 [==============================] - 11s - loss: 0.0471 - acc: 0.9856 - val_loss: 0.0314 - val_acc: 0.9897
Test score: 0.0314083654978
Test accuracy: 0.9897

Execution time is: 2:23 (excluding data download time). It took about 35 minutes with my MBA, so it's more than 10 times faster. It's about 10 seconds per epoch.

I also tried IMDB's LSTM, which seems to be heavy.

python imdb_cnn_lstm.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 100)
X_test shape: (25000, 100)
Build model...
Train...
Train on 25000 samples, validate on 25000 samples
Epoch 1/2
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
   30/25000 [..............................] - ETA: 1397s - loss: 0.6936 - acc: 0.4333I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3811 get requests, put_count=2890 evicted_count=1000 eviction_rate=0.346021 and unsatisfied allocation rate=0.530307
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
  360/25000 [..............................] - ETA: 160s - loss: 0.6935 - acc: 0.4833I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2156 get requests, put_count=2374 evicted_count=1000 eviction_rate=0.42123 and unsatisfied allocation rate=0.373377
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 256 to 281
  870/25000 [>.............................] - ETA: 94s - loss: 0.6925 - acc: 0.5287I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4249 get requests, put_count=4491 evicted_count=1000 eviction_rate=0.222668 and unsatisfied allocation rate=0.192281
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 655 to 720
25000/25000 [==============================] - 63s - loss: 0.3815 - acc: 0.8210 - val_loss: 0.3519 - val_acc: 0.8456
Epoch 2/2
25000/25000 [==============================] - 60s - loss: 0.1970 - acc: 0.9238 - val_loss: 0.3471 - val_acc: 0.8534
24990/25000 [============================>.] - ETA: 0sTest score: 0.347144101623
Test accuracy: 0.853440059948

Execution time: 2:25 (excluding data download time). It was overwhelming because it took more than 40 minutes for MBA.

Let's get on with it and try something that seems heavier. mnist_acgan.py seems to solve MNIST with a guy called ACGAN (Auxiliary Classifier Generative Adversarial Network). Is it a relative of DCGAN? For details, it seems to be listed in here, but it is difficult, so I will postpone it. It will be heavy because it is GAN for the time being. What is it like?

python mnist_acgan.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Epoch 1 of 50
  0/600 [..............................] - ETA: 0sI tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 3.74GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
599/600 [============================>.] - ETA: 1s   
Testing for epoch 1:
component              | loss | generation_loss | auxiliary_loss
-----------------------------------------------------------------
generator (train)      | 3.88 | 1.49            | 2.39 
generator (test)       | 3.36 | 1.04            | 2.32 
discriminator (train)  | 2.11 | 0.53            | 1.58 
discriminator (test)   | 2.12 | 0.70            | 1.43 
Epoch 2 of 50
 44/600 [=>............................] - ETA: 732s

It took about 15 minutes to finish 1 epoch. There are 48 epochs left. .. It was endless and costly, so I stopped halfway through. Well, if you turn it for half a day, you'll get results. It's amazing.

Disk free status after work

df -k
Filesystem     1K-blocks    Used Available Use% Mounted on
udev             7679880       0   7679880   0% /dev
tmpfs            1539900    8800   1531100   1% /run
/dev/xvda1       8117828 6149060   1533492  81% /
tmpfs            7699496       0   7699496   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            7699496       0   7699496   0% /sys/fs/cgroup
/dev/xvdb       66946696 3017852  60521484   5% /mnt
tmpfs            1539904       0   1539904   0% /run/user/1000

System disk usage is 81%. Ephemeral storage volatilizes when the instance is stopped, so if you are worried about the default 8GB, you should expand the storage procedure.

I get an error when I try to create a g2.xlarge instance after making it an AMI

I took a snapshot of the instance I was able to build and created a g2.xlarge instance from the AMI. Try to work properly.

pyenv activate keras
python -V
 Python 3.5.3
python -c 'import tensorflow as tf; print(tf.__version__)'
Traceback (most recent call last):
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 61, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 72, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 61, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/ubuntu/.pyenv/versions/keras/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/ubuntu/.pyenv/versions/3.5.3/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

TensorFlow throws an error. I'm not sure, but is it related to CUDA? It may be necessary to consider later whether it can be converted to AMI. If you want to use it intermittently, you may want to stop the instance and leave it.

Summary

It takes less than an hour to build the above Keras environment. It's okay to stop the instance and leave it, but it's not too much trouble to build it from Sara when you want to use it. It is better to think according to the frequency of use and the case. It's a lot easier than before the TensorFlow GPU environment was built.

[PYTHON] Build Keras environment on AWS E2 G2 instance February 2017 version