[PYTHON] Prepare the environment of Chainer on EC2 spot instance with AWS Lambda

The other day, I wrote an article "xgboost (python) on EC2 spot instance environment is prepared by AWS Lambda", but the Chainer version is just made it.

It would be nice if the environment could be built with a single button or cli, so in short, the content of the following article I wrote earlier was automated with Lambda.

Set up AWS EC2 g2.2xlarge with a spot instance and try running chainer http://qiita.com/pyr_revs/items/e1545e6f464b712517ed

What are you doing

  1. Create an sh file to install Chainer and save it in S3
  2. Request a Spot Instance for EC2. Install NVIDIA driver, CUDA, and various other dependencies in UserData
  3. When the instance is launched, UserData will perform a dependency installation. Furthermore, download the sh file created in Step1 from s3 and take over the processing to ec2-user.
  4. Chainer is installed by the sh file that runs with ec2-user privileges.
  5. To check the operation, try running Chainer's mnist on the GPU.
  6. Throw Notification to SNS when all is done

Where preparation / setting is required

IAM Role preparation, Lambda settings, things to play with in the code, etc. are almost the same as below.

xgboost (python) on EC2 Spot instance environment is prepared by AWS Lambda # Where preparation / setting is required http://qiita.com/pyr_revs/items/4cc188a63eb9313cd232#%E6%BA%96%E5%82%99%E8%A8%AD%E5%AE%9A%E3%81%8C%E5%BF%85%E8%A6%81%E3%81%AA%E3%81%A8%E3%81%93%E3%82%8D

Lambda Function

Due to the price of g2.2xlarge, EC2 is in northern Virginia (Availability Zone is us-east-1d which seems to be stable as of today). Other S3 / SNS / Lambda is assumed to be in the Tokyo region.

Since it is long, I also raised it to gist. https://gist.github.com/pyr-revs/31dba1c9aeff575f58b9

console.log('Launch-Chainer: Start');

var ec2Region = 'us-east-1';
var s3Region = 'ap-northeast-1';
var snsRegion = 'ap-northeast-1';

var s3Bucket = 'mybucket';
var shellScriptS3Key = 'sh/launch_chainer.sh';
var shellScriptS3Path = 's3://' + s3Bucket + '/' + shellScriptS3Key;

var cuDnnAS3Path = 's3://' + s3Bucket + '/cuda/cudnn-6.5-linux-x64-v2.tgz'; // optional

var availabilityZone = ec2Region + 'd';
var spotPrice = '0.2';
var imageId = 'ami-65116700'; // us-east-1, Amazon Linux 2015.09, HVM Instance Store 64 bit
//var imageId = 'ami-e3106686'; // us-east-1, Amazon Linux 2015.09, HVM(SSD)EBS-Backed 64 bit
//var imageId = 'ami-a22fb8a2'; // ap-northeast-1, Amazon Linux 2015.09, HVM Instance Store 64 bit
//var imageId = 'ami-9a2fb89a'; // ap-northeast-1, Amazon Linux 2015.09, HVM(SSD)EBS-Backed 64 bit

var instanceType = 'g2.2xlarge';
var iamInstanceProfile = 'my_ec2_role';
var securityGroup = 'launch-wizard-1';
var keyName = 'my_ssh_keypair';

var userData = (function () {/*#!/bin/bash
cd /root
# Update sudoers
tmp_sudoers=/root/sudoers_tmp
cat /etc/sudoers > $tmp_sudoers
cat >> $tmp_sudoers <<EOF
Defaults:ec2-user !requiretty
EOF
cat $tmp_sudoers > /etc/sudoers
# Install yum deps
yum update -y
yum groupinstall -y "Development tools"
yum -y install gcc-c++ python27-devel atlas-sse3-devel lapack-devel
yum install -y kernel-devel-`uname -r`
# Install NVIDIA Driver
wget -q http://us.download.nvidia.com/XFree86/Linux-x86_64/346.96/NVIDIA-Linux-x86_64-346.96.run
chmod +x NVIDIA-Linux-x86_64-346.96.run
./NVIDIA-Linux-x86_64-346.96.run -s > driver.log 2>&1
# Install CUDA (without driver installation... for Amazon Linux 2015.09)
wget -q http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_7.0.28_linux.run
chmod +x cuda_7.0.28_linux.run
./cuda_7.0.28_linux.run -extract=/root
./cuda-linux64-rel-7.0.28-19326674.run -noprompt > cuda.log 2>&1
# Install cuDNN (Optional)
#aws s3 cp %s ./
#tar zxvf cudnn-6.5-linux-x64-v2.tgz
#cd cudnn-6.5-linux-x64-v2
#cp lib* /usr/local/cuda/lib64/
#cp cudnn.h /usr/local/cuda/include/
# Install python deps
pip install numpy
pip install six
# Update .bashrc for ec2-user
tmp_bashrc=/home/ec2-user/.bashrc_backup
cat /home/ec2-user/.bashrc > $tmp_bashrc
cat >> $tmp_bashrc <<EOF
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
EOF
cat $tmp_bashrc > /home/ec2-user/.bashrc
# Launch post-installation script with ec2-user
aws s3 cp %s /home/ec2-user/launch_chainer.sh
chown ec2-user /home/ec2-user/launch_chainer.sh
chmod +x /home/ec2-user/launch_chainer.sh
su - ec2-user /home/ec2-user/launch_chainer.sh
*/}).toString().match(/[^]*\/\*([^]*)\*\/\}$/)[1];

var shellScriptContents = (function () {/*#!/bin/bash
cd /home/ec2-user
# Install Chainer
git clone https://github.com/pfnet/chainer
cd /home/ec2-user/chainer
sudo -s python setup.py install > setup.log 2>&1
# Run Chainer Sample with GPU
cd /home/ec2-user/chainer/examples/mnist
python train_mnist.py --gpu=0 > run.log 2>&1
# Send SNS Message
export AWS_DEFAULT_REGION=%s
aws sns publish --topic-arn arn:aws:sns:ap-northeast-1:xxxxxxxxxxxx:My-SNS-Topic --subject "Launch Chainer Done" --message "Launch Chainer Done!!"
*/}).toString().match(/[^]*\/\*([^]*)\*\/\}$/)[1];

exports.handler = function(event, context) {
    var util = require('util');
    var AWS = require('aws-sdk');
    
    // Write sh file for chainer launch to S3
    AWS.config.region = s3Region;
    var shellScriptContentsFormatted = util.format(shellScriptContents, snsRegion);
    var s3 = new AWS.S3();
    var s3Params = {Bucket: s3Bucket, Key: shellScriptS3Key, Body: shellScriptContentsFormatted};
    var s3Options = {partSize: 10 * 1024 * 1024, queueSize: 1};
    
    s3.upload(s3Params, s3Options, function(err, data) {
        if (err) {
            console.log(err, err.stack);
            context.fail('[Fail]');
        }
        else {
            console.log(data);
            
            // Lauch EC2 Spot Instance with UserData
            var userDataFormatted = util.format(userData, cuDnnAS3Path, shellScriptS3Path);
            var userDataBase64 = new Buffer(userDataFormatted).toString('base64');
    
            var ec2LaunchParams = {
                SpotPrice: spotPrice, 
                LaunchSpecification : {
                    IamInstanceProfile: {
                      Name: iamInstanceProfile
                    },
                    // EBS Setting (for ami-e3106686)
                    /*
                    BlockDeviceMappings : [
						{
                          DeviceName : '/dev/xvda',
                          Ebs : { VolumeSize : 16 }
                        },
					],
                    */
                    // Instance Storage Setting (for ami-65116700)
                    BlockDeviceMappings : [
						{
					      DeviceName  : '/dev/sdb',
					      VirtualName : 'ephemeral0'
					    }
					],
                    ImageId: imageId,
                    InstanceType: instanceType,
                    KeyName: keyName,
                    Placement: {
                      AvailabilityZone: availabilityZone
                    },
                    SecurityGroups: [
                        securityGroup
                    ],
                    UserData: userDataBase64
                }
            };
            
            AWS.config.region = ec2Region;
            var ec2 = new AWS.EC2();
            ec2.requestSpotInstances(ec2LaunchParams, function(err, data) {
                if (err) {
                    console.log(err, err.stack);
                    context.fail('[Fail]');
                }
                else {
                    console.log(data);
                    context.succeed('[Succeed]');
                }
            });
        }
    });
};

Addictive & changes from previous article

NVIDIA Driver Version for Amazon Linux 2015.09

In Amazon Linux 2015.09 released the other day, when I try to extract and install the NVIDIA Driver included in CUDA, I get the following curse message.

ERROR: Unable to build the NVIDIA kernel module

It seems that the kernel version of the OS went up and the kernel-devel and some driver did not match and died.

NVIDIA driver download http://www.nvidia.co.jp/Download/Find.aspx?lang=jp

nvidia.png

Among the drivers that came out with this, "346.96 / 1.9.2015" was close to the previous version number. I put it in and it worked, so it's a mess, but I think that the same problem will occur in the future.

At worst, put NVIDIA AMI on demand (not spot instance), see the driver version with nvidia-smi, or see the cuda version. Or you may need to check the status of kernel-devel.

Disk settings

Before, to be honest, I didn't really understand what EBS was and what instance storage was, so I sloppyly raised EBS to 16GB to avoid running out of tmp, but "HVM ** Instance Store " I've noticed that I don't need EBS if I use " 64 bit" ami. Currently, the OS is installed directly in the basic instance storage. With the SSD 60GB that comes with the g2.2xlarge, I'm not dissatisfied for the moment.

Support for Chainer 1.3 series

Now that I don't have to put in pycuda, the setup is much easier. I think that the operation check of cuDNN is OK if python is started in interactive mode and true is returned below.

from chainer import cuda
print cuda.cudnn_enabled

Recommended Posts

Prepare the environment of Chainer on EC2 spot instance with AWS Lambda
Note: Prepare the environment of CmdStanPy with docker
Prepare the execution environment of Python3 with Docker
code-server online environment (4) Launch code-server on the EC2 instance
# 2 Build a Python environment on AWS EC2 instance (ubuntu18.04)
Deployment procedure on AWS (2) Server (EC2 instance) environment settings
Prepare Chainer environment on Windows
Prepare the development environment with anyenv
# 3 Build a Python (Django) environment on AWS EC2 instance (ubuntu18.04) part2
Using PhantomJS with AWS Lambda until displaying the html of the website
Prepare an environment to use OpenCV and Pillow with AWS Lambda
# 1 Until you deploy Django's web application (instance construction with EC2 on AWS)
Prepare the development environment for Python on AWS Cloud9 (pip install & time change)
Try Tensorflow with a GPU instance on AWS
Build python environment with pyenv on EC2 (ubuntu)
I installed Pygame with Python 3.5.1 in the environment of pyenv on OS X
[AWS] Let's run a unit test of Lambda function in the local environment
[AWS S3] Confirmation of the existence of folders on S3
code-server online environment (3) Launch an EC2 instance with Boto3
The story of sharing the pyenv environment with multiple users
Deploy Python3 function with Serverless Framework on AWS Lambda
Support for Python 2.7 runtime on AWS Lambda (as of 2020.1)
Preparing the execution environment of PyTorch with Docker November 2019
I want to AWS Lambda with Python on Mac!
Install Ubuntu 20.04 with GUI and prepare the development environment
How to get the information of organizations, Cost Explorer of another AWS account with Lambda (python)
[AWS] Install node.js on EC2 instance and execute sample program
Unify the environment of the Python development team starting with Poetry
read the tag assigned to you on ec2 with boto3
Let's execute the command on time with the bot of discord
Achieve automatic shutdown of EC2 instances with Lambda + CloudWatch Events
Build Keras environment on AWS E2 G2 instance February 2017 version
I just built a virtual environment with AWS lambda layer
Post images of Papillon regularly on Python + AWS Lambda + Slack
Building an environment to execute python programs on AWS EC2
Get the host name of the host PC with Docker on Linux
The simplest AWS Lambda implementation
Prepare python3 environment with Docker
AWS Lambda with PyTorch [Lambda import]
Read the coordinates of the plot on the graph with Python-matplotlib (super beginner)
Build a Chainer environment using CUDA and cuDNN on a p2 instance
Environment construction of Tensorflow and Chainer by Window with CUDA (with GPU)
Things to note when running Python on EC2 from AWS Lambda
I studied with Kaggle Start Book on the subject of kaggle [Part 1]
Serverless scraping on a regular basis with AWS lambda + scrapy Part 1
Install pip in Serverless Framework and AWS Lambda with Python environment
I tried running the DNN part of OpenPose with Chainer CPU
I checked the image of Science University on Twitter with Word2Vec.
Prepare a distributed load test environment with the Python load test tool Locust
Building an environment to run ChainerMN on a GPU instance on AWS
Memorandum of understanding when Python is run on EC2 with Apache
Periodically log the value of Omron environment sensor with Raspberry Pi
Use AWS lambda to scrape the news and notify LINE of updates on a regular basis [python]
Upload data to s3 of aws with a command and update it, and delete the used data (on the way)