[PYTHON] Easily build HPC on AWS with genuine AWS Cfn Cluster

What is CfnCluster?

A tool that makes it easy to build cluster machines on AWS Official version of Starcluster Click here for Getting started

Environment

Current environment

  • ASUS C300
  • Chrome OS(linux-64)
  • miniconda3-3.18.3

AWS side preparation

Create a user in the AWS Console (https://console.aws.amazon.com/iam/home?region=ap-northeast-1#users/)

--Create a new user --Select User> Permissions> Attach Policy> Administrator Access

I don't know if other roles are acceptable

--Credentials> Create Access Key> Download Credentials

-Create Key Pair> Download pem file

Install CfnCluster

--Build a virtual environment with anaconda --It looks like it must be python 2.7

conda create -n cfn python=2.7 boto boto3 #For the time being, put the boto in the conda official
source activate cfn
pip install cfncluster

Generate configure file

(cfn)chronos@localhost / $ cfncluster configure

Enter various information in a wizard format --Cluster Template: # Cluster machine name. This time mycluster --AWS Access Key ID: # Refer to Credentials --AWS Secret Access Key ID: # Refer to Credentials --AWS Region ID: # Choices are displayed. This time ap-northeast-1 --VPC Name: #VPC name can be anything, this time test --Key Name: #Select the key you just created --VPC ID: # Choices are displayed, so select from them --Master Subnet ID: # Choices are displayed, so select from them

The set config file was generated here ~/.cfncluster/config

[aws]
aws_region_name = ap-northeast-1
aws_access_key_id = xxxxxxxxxxxxxxxxx
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

[cluster mycluster]
vpc_settings = test
key_name = xxxxxx

[vpc test]
master_subnet_id = subnet-xxxxxxxx
vpc_id = vpc-xxxxxxxx

[global]
update_check = true
sanity_check = true
cluster_template = mycluster

See here for detailed settings.

Setting items that are likely to be used

Add it after the [cluster] section in the config file and it's OK These items start up with default values without setting

--master_instance_type: Master instance default is t2.micro --compute_instance_type: Compute node instance default is t2.micro --initial_queue_size: Number of compute nodes to start first. default is 2 --maintain_initial_size: Whether to auto scale. Note that the default is false (scaling), which is the opposite of the feeling. --max_queue_size: Maximum number of compute nodes when auto-scaling. default is 10 --cluster_type: Whether to set up a cluster on demand or spot. The default is ondemand.

Even if you specify spot, MasterServer is ondemand (because it will be a problem if it falls in the middle)

--spot_price: Bid amount when cluster_type = spot. defalut = 0.00 --custom_ami: AMI can be specified

I added the following items and launched it as a trial

maintain_initial_size = true
initial_queue_size = 1
cluster_type = spot
compute_instance_type = m3.medium
spot_price = 0.02

How to use CfnCluster

Start the cluster

(cfn)chronos@localhost / $ cfncluster create mycluster

It takes considerably longer than the star cluster. It takes less than 20 minutes with 1 master and 1 compute node The breakdown is roughly as follows --Security settings: 4min --Master launch: 5min --Spot bid: 5min --Computation node launch: 3min --Post-processing: 1min

If the bid is too cheap to build, EC2 will become a zombie even if you stop cfncluster with ctrl-x. Stop it properly with cfncluster delete mycluster.

Confirmation of Ganglia

When launched with cfn, the server monitoring tool Ganglia will also be launched.

Output:"GangliaPublicURL"="http://xx.xxx.xxx.xxx/ganglia/"

Screenshot 2016-03-05 at 21.21.59.png

Connect with ssh

--Make a note of the public ip address and ssh with ec2-user @ public ip

Verification

qhost Screenshot 2016-03-05 at 21.32.15.png

Delete cluster

(cfn)chronos@localhost / $ cfncluster delete mycluster

Impressions & doubts

-There is a sense of security because it is genuine. But why is it so slow? --How do you explicitly increase or decrease the compute node? Update? --Auto Scaling looks interesting, but uninvestigated --Investigation required for cooperation with S3 ――This is all for today

[Note that I often forget] SSH connection to EC2 with chromebook-secureshell

Reference URL

--The private key is key.pem

mkdir sshkey
cp key.pem ./sshkey/
chmod 600 ./sshkey/
sudo ssh-keygen -y -f ./sshkey/key.pem >key.pub
cp key.pem key

Import the generated key (renamed key.pem) and key.pub to Secure Shell.

Recommended Posts

Easily build HPC on AWS with genuine AWS Cfn Cluster
Build a WardPress environment on AWS with pulumi
[AWS] Let's build an ECS Cluster with CDK
Easily build CNN with Keras
Easily build network infrastructure and EC2 with AWS CDK Python
Build a Flask / Bottle-like web application on AWS Lambda with Chalice
[AWS] Build an ECR with AWS CDK
Easily build virtual machines with Vagrant
Easily build a development environment with Laragon
Build Python environment with Anaconda on Mac
Build CentOS 8 on ESXi 6.7 with minimal configuration
Try Tensorflow with a GPU instance on AWS
Build python environment with pyenv on EC2 (ubuntu)
Build a python environment with ansible on centos6
Build a cheap summarization system with AWS components
Easily log in to AWS with multiple accounts
Getting started with AWS IoT easily in Python