[PYTHON] Building an environment to run ChainerMN on a GPU instance on AWS

I decided to try ChainerMN and built an environment on AWS, so I will keep a record of the work.

AWS p2 instances are reasonably priced, so you want to finish building your environment quickly.

Constitution

Reference site

-How to install CUDA on Ubuntu 16.04 -Unofficial tips for people who have trouble installing Chainer 1.5

work

Preparation

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install linux-generic
$ sudo apt-get install build-essential
$ vi .bashrc #Added the following two lines
export LD_LIBRARY_PATH="/usr/local/lib:$LD_LIBRARY_PATH"
export CPATH="/usr/local/include"

Install NVIDIA Driver and CUDA

Go to CUDA Toolkit Download, select Linux, x86_64, Ubuntu, 16.04, deb [network], pick up the download link, and do the following: do.

$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo apt-get update
$ sudo apt-get install cuda nvidia-367
$ sudo reboot
$ sudo apt-get autoremove
$ rm cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ vi .bashrc #Added the following 4 lines
export CUDA_HOME="/usr/local/cuda-8.0"
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export CPATH="$CUDA_HOME/include:$CPATH"

Log in again.

Installation of cuDNN 5.1

Download cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0, cuDNN v5.1 Library for Linux at cuDNN Download and AWS Put it on top.

$ tar zxvf cudnn-8.0-linux-x64-v5.1.tgz
$ sudo cp -a cuda/lib64/* $CUDA_HOME/lib64/
$ sudo cp -a cuda/include/* $CUDA_HOME/include/
$ sudo ldconfig
$ rm -rf cuda cudnn-8.0-linux-x64-v5.1.tgz

Re-paste the symbolic link (additional note 2017-06-08)

In the work of the previous section

$ sudo ldconfig

When I did

/sbin/ldconfig.real: /usr/lib/nvidia-375/libEGL.so.1 is not a symbolic link
/sbin/ldconfig.real: /usr/lib32/nvidia-375/libEGL.so.1 is not a symbolic link

I was supposed to get it, so I re-pasted the symbolic link in the work below.

$ sudo mv /usr/lib/nvidia-375/libEGL.so.1 /usr/lib/nvidia-375/libEGL.so.1.org
$ sudo mv /usr/lib32/nvidia-375/libEGL.so.1 /usr/lib32/nvidia-375/libEGL.so.1.org
$ sudo unlink /usr/lib/nvidia-375/libEGL.so
$ sudo ln -s /usr/lib/nvidia-375/libEGL.so.375.66 /usr/lib/nvidia-375/libEGL.so
$ sudo unlink /usr/lib32/nvidia-375/libEGL.so
$ sudo ln -s /usr/lib32/nvidia-375/libEGL.so.375.66 /usr/lib32/nvidia-375/libEGL.so
$ sudo ldconfig

I don't know if this is the case, but I can use it for the time being.

Stop LightDM

$ sudo vi /etc/default/grub #Edit line 12 for:
GRUB_CMDLINE_LINUX="systemd.unit=multi-user.target"
$ sudo update-grub
$ sudo reboot

Install Open MPI

Pick up the Open MPI download link from Open MPI Open Source High Performance Computing and do the following:

$ wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.1.tar.bz2
$ tar jxvf openmpi-2.1.1.tar.bz2
$ cd openmpi-2.1.1
$ ./configure --with-cuda
$ make -j4
$ sudo make install
$ cd
$ rm -rf openmpi-2.1.1 openmpi-2.1.1.tar.bz2

Install NVIDIA NCCL

$ git clone https://github.com/NVIDIA/nccl.git
$ cd nccl
$ make CUDA_HOME=/usr/local/cuda-8.0
$ sudo mkdir /usr/local/nccl
$ sudo make PREFIX=/usr/local/nccl install
$ cd
$ rm -rf nccl
$ vi .bashrc #Added the following 4 lines
export NCCL_ROOT="/usr/local/nccl"
export CPATH="$NCCL_ROOT/include:$CPATH"
export LD_LIBRARY_PATH="$NCCL_ROOT/lib/:$LD_LIBRARY_PATH"
export LIBRARY_PATH="$NCCL_ROOT/lib/:$LIBRARY_PATH"

Log in again.

Installation of Chainer, ChainerMN, etc.

$ sudo apt-get install python3-pip
$ sudo pip3 install --upgrade pip
$ pip3 install --user pillow h5py chainer\==1.24.0
$ pip3 install --user cython
$ pip3 install --user chainermn

Where it gets stuck

NVIDIA driver installation

ChainerMN installation does not work

--Cause: Cython did not look at LD_LIBRARY_PATH and CPATH correctly

According to the reference Unofficial Tips for People Who Can't Install Chainer 1.5, LD_LIBRARY_PATH and CPATH must be set before installing Cython. .. Also note that if you pip with sudo, environment variables will not be inherited by root. Let's do it with --user.

Recommended Posts

Building an environment to run ChainerMN on a GPU instance on AWS
Run TensorFlow on a GPU instance on AWS
Building an environment to execute python programs on AWS EC2
Try Tensorflow with a GPU instance on AWS
Create an AWS GPU instance to train StyleNet
June 2017 version to build Tensorflow / Keras environment on GPU instance of AWS
# 2 Build a Python environment on AWS EC2 instance (ubuntu18.04)
Building a TensorFlow environment that uses GPU on Windows 10
Try running a Schedule to start and stop an instance on AWS Lambda (Python)
[Introduction to AWS] A memorandum of building a web server on AWS
Building a Python environment on Mac
Building a Python environment on Ubuntu
Use jupyter on AWS GPU instance
# 3 Build a Python (Django) environment on AWS EC2 instance (ubuntu18.04) part2
Create an AWS Cloud9 development environment on your Amazon EC2 instance
Everything from building a Python environment to running it on Windows
Building an environment for "Tello_Video" on Raspbian
Building an environment for "Tello_Video" on Windows
Procedure for building a kube environment on amazon linux2 (aws) ~ (with bonus)
Periodically run a python program on AWS Lambda
Build a WardPress environment on AWS with pulumi
Building an environment for matplotlib + cartopy on Mac
Introduction to Python "Re" 1 Building an execution environment
A story about building an IDE environment with WinPython on an old Windows OS.
How to run a Django application on a Docker container (development and production environment)
How to run AutoGluon in Google Colab GPU environment
Deployment procedure on AWS (2) Server (EC2 instance) environment settings
Building an environment for "Tello_Video" on Mac OS X
How to build a Django (python) environment on docker
Procedure for building a CDK environment on Windows (Python)
How to deploy a Go application to an ECS instance
Run GPU version tensorflow on AWS EC2 Spot Instances
How to run Django on IIS on a Windows server
How to build a Python environment on amazon linux 2
Building an environment to use CaboCha with google colaboratory
Run the program without building a Python environment! !! (How to get started with Google Colaboratory)
Posted as an attachment to Slack on AWS Lambda (Python)
A memo on how to easily prepare a Linux exercise environment
How to run a trained transformer model locally on CloudTPU
How to build a new python virtual environment on Ubuntu
Let's get started with Python ~ Building an environment on Windows 10 ~
Add an extension to build a more comfortable Jupyter environment
Building a Jupyter Lab development environment on WSL2 using Anaconda3
Building a Python environment on a Mac and using Jupyter lab
Summary from building Python 3.4. * From source to building a scientific computing environment
Introducing Kaggle's Docker Image on Windows to build an environment
A note on how to load a virtual environment in PyCharm
Build Keras environment on AWS E2 G2 instance February 2017 version
Until building a Python development environment using pyenv on Ubuntu 20.04
Building a Python virtual environment
Run YOLO v3 on AWS v2
Building a Python virtual environment
Run YOLO v3 on AWS
Building a Python development environment on Windows -From installing Anaconda to linking Atom and Jupyter Notebook-
I tried to create a server environment that runs on Windows 10
I tried to create an environment of MkDocs on Amazon Linux
Build a Chainer environment using CUDA and cuDNN on a p2 instance
A game to go on an adventure in python interactive mode
xgboost (python) on EC2 Spot instance environment prepared by AWS Lambda
Run a Java app that resides on AWS EC2 as a daemon
From building a Python environment for inexperienced people to Hello world