[PYTHON] Try Tensorflow with a GPU instance on AWS

EC2 setup

Build an Ubuntu instance using a GPU instance called p2.xlarge as shown below. (Currently, p2 instances are not available in the Tokyo region, so here)

AMI selection: Ubuntu Server 16.04 LTS (HVM), SSD Volume Type --ami-80861296
Instance type selection: p2.xlarge (GPU computing) $ 0.90 / h
Instance details settings: Leave default
Addition of storage: 64GiB general purpose SSD
Add Tags: None
Add Security Group: Create a new security group
sg_01: ssh 22, custom TCP 9898 (for jupyter)
Confirmation and creation
Create new key pair: kp_01.pem

Set a suitable name for the security group and key pair. (Here, it is sg_01, kp_01```.) You can also use an existing key pair without any problem.

After downloading the key pair, go to .ssh and change the permissions.

$ mv ~/Download/kp_01.pem ~/.ssh/.
$ chmod 600 ~/.ssh/kp_01.pem

After the instance is created, check the Public DNS in the management console and log in with SSH.

$ ssh -i ~/.ssh/kp_01.pem ubuntu@<Public DNS>

The following is the work on EC2. First, update the package.

$ sudo apt-get update
$ sudo apt-get upgrade

CUDA

Install CUDA 8.0

URL: https://developer.nvidia.com/cuda-downloads Installation guide: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Prior confirmation

Check if you have a GPU that supports CUDA

$ lspci | grep -i nvidia
00:1e.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80](rev a1)

Check if the OS is compatible with CUDA

$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Installation of gcc (+ development tools)

$ sudo apt-get install build-essential

Install the same version of kernel header as the running kernel

$ sudo apt-get install linux-headers-$(uname -r)

Installation

In "Select Target Platform" of https://developer.nvidia.com/cuda-downloads, select as follows to display the download link and installation procedure. Get the file with wget from the linked URL and install it. (Here we will install `` `cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb```.)

$ wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb
$ sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb
$ sudo apt-get update
$ sudo apt-get install cuda

Setting environment variables

Set in ~ / .bash_profile as follows.

`~/.bash_profile`


export CUDA_HOME="/usr/local/cuda-8.0"
export PATH="${CUDA_HOME}/bin${PATH:+:${PATH}}"
export LD_LIBRARY_PATH="${CUDA_HOME}/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

Please log in again to reflect the settings.

$ exec $SHELL -l

Operation check

Build the sample program and check the operation. (It doesn't matter if you don't run it.)

$ cuda-install-samples-8.0.61.sh test
$ cd test/NVIDIA_CUDA-8.0_Samples
$ sed -i "s/nvidia-367/nvidia-375/g" `grep "nvidia-367" -r ./ -l`
$ make

The sed line has been replaced because there was an error in the driver specification in the Makefile included in this version of the sample program. (Reference: https://askubuntu.com/questions/911548/cuda-examples-not-working-after-cuda-8-0-install)

Installation of cuDNN 5.1

URL: https://developer.nvidia.com/cudnn You need to be a member of the NVIDIA Developer Program to download. Since authentication is required, download the file to your local PC and upload it to EC2 via SCP. (Here, `` `cudnn-8.0-linux-x64-v5.1.tgz``` is used.)

SCP from local

$ scp -i ~/.ssh/kp_01.pem ~/Downloads/cudnn-8.0-linux-x64-v5.1.tgz ubuntu@<Public DNS>:~/.

Install on EC2 (file extraction and placement only)

$ tar zxvf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo cp cuda/include/* ${CUDA_HOME}/include/.
$ sudo cp cuda/lib64/* ${CUDA_HOME}/lib64/.

Install NVIDIA CUDA Profile Tools Interface (libcupti-dev)

You can install it with apt-get.

$ sudo apt-get install libcupti-dev

However, this time, when I executed it, I got the error "*** is not a symbolic link", so I solved it as follows. (Reference: http://stackoverflow.com/questions/43016255/libegl-so-1-is-not-a-symbolic-link)

$ sudo mv /usr/lib/nvidia-375/libEGL.so.1 /usr/lib/nvidia-375/libEGL.so.1.org
$ sudo ln -s /usr/lib/nvidia-375/libEGL.so.375.39 /usr/lib/nvidia-375/libEGL.so.1

$ sudo mv /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5 /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.org
$ sudo ln -s /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5.1.10 /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.5

$ sudo mv /usr/lib32/nvidia-375/libEGL.so.1 /usr/lib32/nvidia-375/libEGL.so.1.org
$ sudo ln -s /usr/lib32/nvidia-375/libEGL.so.375.39 /usr/lib32/nvidia-375/libEGL.so.1

GPU settings

http://docs.aws.amazon.com/ja_jp/AWSEC2/latest/UserGuide/accelerated-computing-instances.html Apply "Optimize GPU settings (P2 instance only)" by referring to.

$ sudo nvidia-smi -pm 1
$ sudo nvidia-smi --auto-boost-default=0
$ sudo nvidia-smi -ac 2505,875

Python environment

Create the environment of pyenv + miniconda by referring to the article of here. ("There is actually a problem with anaconda alone.")

pyenv https://github.com/pyenv/pyenv#installation

git clone and set ``` ~ / .bash_profile` ``.

$ git clone https://github.com/pyenv/pyenv.git ~/.pyenv

`~/.bash_profile`


export PYENV_ROOT="${HOME}/.pyenv"
export PATH="${PYENV_ROOT}/bin:${PATH:+:${PATH}}"
eval "$(pyenv init -)"

miniconda Install the latest miniconda (here, miniconda 3-4.3.11) with pyenv.

$ pyenv install -l | grep miniconda
...
(abridgement)
...
  miniconda3-4.3.11

$ pyenv install miniconda3-4.3.11

`~/.bash_profile`


export CONDA_HOME="${PYENV_ROOT}/versions/miniconda3-4.3.11"
export PATH="${CONDA_HOME}/bin${PATH:+:${PATH}}"

Tensorflow Install with Anaconda Create an Anaconda environment with conda and install Tensorflow.

$ conda create -n tensorflow python=3.5 anaconda
$ source activate tensorflow
(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-linux_x86_64.whl

Jupyter notebook http://jupyter-notebook.readthedocs.io/en/latest/public_server.html Make settings to connect to Jupyter notebook started on EC2 from a local PC.

Access with any host name
Change the port number from the default (set to 9999 here)
Connect with https
password setting

Creating a server certificate and key file

(tensorflow)$ mkdir certificate
(tensorflow)$ cd certificate
(tensorflow)$ openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mykey.key -out mycert.pem

Password hash value creation

(tensorflow)$ python
>>> from notebook.auth import passwd
>>> passwd()
Enter password: 
Verify password:
'sha1:********'
>>> exit()

Jupyter configuration file creation

Output the template of the configuration file

(tensorflow)$ jupyter notebook --generate-config

Add the following settings

`:~/.jupyter/jupyter_notebook_config.py`


c.NotebookApp.certfile = '/home/ubuntu/certificate/mycert.pem'
c.NotebookApp.keyfile = '/home/ubuntu/certificate/mykey.key'
c.NotebookApp.ip = '*'
c.NotebookApp.port = 9999
c.NotebookApp.open_browser = False
c.NotebookApp.password='sha1:********'

Start Jupyter notebook

(tensorflow)$ jupyter notebook

When you access https: // <Public DNS>: 9999 with the browser of your local PC, the password input screen will be displayed. Enter the password you entered when you created the password hash value. To log in.