[PYTHON] Try running Distributed TensorFlow on Google Cloud Platform

By the way, I learned the mechanism of Distributed Tensorflow with last time and previous. Finally, let's run it on Google Cloud Platform (GCP). If you create a lot of vCPU instances with the free frame of \ $ 300, you can see high-powered scales even for the poor. That was my motivation.

When I first broke it, I found that GCP's free account was limited to 8 vCPUs per region, so it wasn't that high firepower. This time, one vCPU was assigned to the interface and one to the master server, so it will be parallelized with six vCPUs. Still, of course, it's faster.

Creating a Docker image

I want to run it on GCP's Container Engine, so first create a Docker image.

Create a container image tf \ _server for the worker that contains only the Tensorflow server that was built the other day, and a container image tf \ _if for the interface.

First, tf \ _server just copied the binaries of grpc \ _tensorflow \ _server to / bin / of the plain ubuntu image. I have listed it in docker.io/ashipong/tf_server.

Let's add a little more tf \ _if. Copy the Distributed Tensorflow version of whl built with bazel and

$ sudo apt-get install python python-pip python-dev
$ pip install tensorflow-0.7.1-py2-none-any.whl

I was able to install TensorFlow itself that supports Distributed. It would be convenient if you could access it with jupyter notebook, so install jupyter as well. Also set the notebook server. Describe ~ / .jupyter / jupyter \ _notebook \ _config.py according to here. The password is "distributed_tensorflow". Since IP is not restricted and it is dangerous, those who publish this image should take all possible precautions such as restricting the IP of the access source and rewriting the password as soon as possible.

こちらも同様にdockerhubに上げました。docker.io/ashipong/tf_ifです。

Practice Distributed TensorFlow on Docker on OSX

Let's see if it works in the same way with a Docker image. First, there is only one grpc server.

$ docker run -d -p 2222:2222 ashipong/tf_server /bin/grpc_tensorflow_server  --cluster_spec='master|localhost:2222' --job_name=master --task_index=0 &

Set up a server with, and from the client,

import tensorflow as tf
c = tf.constant("Hello, distributed TensorFlow on docker!")
sess = tf.Session("grpc://192.168.99.100:2222")
sess.run(c)

Access like. 192.168.99.100 is the IP address of docker-machine. It worked properly.

Next, what if there are two containers on the server? Set up a server container as shown below.

$ docker run -d -p 2222:2222 --name tf_master ashipong/tf_server /bin/grpc_tensorflow_server --cluster_spec='master|192.168.99.100:2222,slave|192.168.99.100:2223' --job_name=master --task_index=0 &
$ docker run -d -p 2223:2222 --name tf_slave ashipong/tf_server /bin/grpc_tensorflow_server --cluster_spec='master|192.168.99.100:2222,slave|192.168.99.100:2223' --job_name=slave --task_index=0 &

Both containers open the server on port 2222. On the host side, we will connect to 2222,2223. Since containers can be connected using docker's Virtual Network Function, the IP and port seen from the host side are described in cluster_spec. Well, I can't write it well, but is it like this?

docker.png

In this way, you can increase the number of containers one after another in an environment where docker can be used, but as you can see from the above procedure, cluster_spec can access the IP and port of all the containers that make up the TensorFlow cluster from each container. Must be written in form. If you want to run each container on another machine, you need to set up a network so that the containers running on different machines can communicate with each other.

Distributed MNIST

As expected, in the model that functions-approximates $ y = exp (x) $ that was used up to previous, the load increases significantly even if the number of batches is increased. Since there is no such thing, I think it would be better to have a model with a little larger data. That's why I tried to parallelize the standard MNIST. Here is a single version, [here](https://github.com/ashitani/DistributedTensorFlowSample/blob/ master / gcp / mnist / mnist_distributed.py) is the distributed version of the code.

Basically, I just applied what I did in Last time to MNIST. I have a lot of collections, but I may be able to write more efficiently.

To GCP (Results)

Now, let's finally run it on GCP. We will go first with the results. As mentioned at the beginning, I created 8 clusters with 1 vCPU and assigned if (interface) and 6 masters and workers respectively. The job name and the cluster (pod) name are the same.

The composition is as follows when written in a picture.

gcp.png

As a result, it looks like this for a single.

# python mnist_single.py
step 00000, training accuracy 0.123, loss 1762.99, time 1.253 [sec/step]
step 00010, training accuracy 0.218, loss 1308.50, time 1.293 [sec/step]
step 00020, training accuracy 0.382, loss 1191.41, time 1.223 [sec/step]
step 00030, training accuracy 0.568, loss 1037.45, time 1.235 [sec/step]
step 00040, training accuracy 0.672, loss 939.53, time 1.352 [sec/step]
step 00050, training accuracy 0.742, loss 853.92, time 1.303 [sec/step]
...

The distributed version looks like this. Pass the name of the machine on which the master server is running as an argument. This time it has the same name as master.

# python mnist_distributed.py master
step 00000, training accuracy 0.130, loss 1499.24, time 0.433 [sec/step]
step 00010, training accuracy 0.597, loss 839.25, time 0.405 [sec/step]
step 00020, training accuracy 0.828, loss 437.39, time 0.478 [sec/step]
step 00030, training accuracy 0.893, loss 220.44, time 0.438 [sec/step]
step 00040, training accuracy 0.902, loss 219.77, time 0.383 [sec/step]
step 00050, training accuracy 0.942, loss 131.92, time 0.370 [sec/step]
...

It's about 3x speed. Since there are 6 people, it cannot be 6 times faster, but if it is 3 times faster, it is a big deal because it is as different as painting the aircraft red and making corners. A deeper net will show its true value. It seems that the dispersion plate is better for the degree of convergence, but this time I did not match the seed of the random number, so let's just happen.

By the way, with my Macbook Pro, the single version took about 0.8 seconds. At least it's free, but it's faster than the Macbook Pro. (Well, the speed of GCP depends on the degree of congestion)

To GCP (Procedure)

Now, make a note of the GCP steps leading up to the above. I'm a cloud beginner who started Docker for some reason, so I may be doing stupid things, but please forgive me.

First, create a project from the GCP web. After that, it is an operation from the client.

First, let's put the ID in the environment variable. Associate the project with your machine.

$ export PROJECT_ZONE=YOUR_ZONE
$ export PROJECT_ID=YOUR_PROJECT
$ gcloud config set project ${PROJECT_ID}
$ gcloud config set compute/zone ${PROJECT_ZONE}

Next, push the container image you created earlier to the Container Registry.

$ docker tag ashipong/tf_server asia.gcr.io/${PROJECT_ID}/tf_server
$ gcloud docker push asia.gcr.io/${PROJECT_ID}/tf_server
$ docker tag ashipong/tf_if asia.gcr.io/PROJECT_ID/tf_if
$ gcloud docker push asia.gcr.io/${PROJECT_ID}/tf_if

Create a container cluster on the Container Engine. The name is tf.

$ gcloud container clusters create tf --num-nodes 8 --machine-type n1-standard-1
$ gcloud container clusters get-credentials tf

Create pods with 1 if, 1 master, and 6 workers in the cluster. The pod is a master.yml file like the one below

master.yml


apiVersion: v1
kind: Pod
metadata:
  name: master
  labels:
    app: tfserver
spec:
  containers:
    - name: master
      command: ["/bin/grpc_tensorflow_server"]
      args: ["--cluster_spec=master|master:2222,worker0|worker0:2222,worker1|worker1:2222,worker2|worker2:2222,worker3|worker3:2222,worker4|worker4:2222,worker5|worker5:2222,","--job_name=master","--task_index=0"]
      image: asia.gcr.io/${PROJECT_ID}/tf_server
      ports:
        - containerPort: 2222
  nodeSelector:
    app: master

Make something like

$ kubectl create -f master.yml

You can make it like this, but it is difficult to prepare for everyone, so I made a Generate script. As a result, the setting of nodeSelector was unnecessary, but I thought that it would be better to associate node and pod during debugging, so I labeled the node side and associated node and pod. This area is also done with a generation script.

This time, I made a pod with another name, worker0, worker1, .... It would be cool if the worker could be set to one pod and the number of containers could be increased or decreased while checking the load, but the current grpc_tensorflow_server needs to describe all other members in the cluster_spec argument at startup. If you change the number of containers later, you will need to restart all the servers. I think that a more sophisticated mechanism will be released from the head family or somewhere in this area.

Anyway, I will label the node with the following generation script and go around to set up the pod.

$ python ./create_tf_servers.py

Check if the pod was generated.

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
if        1/1       Running   0          1m
master    1/1       Running   0          1m
worker0   1/1       Running   0          1m
worker1   1/1       Running   0          1m
worker2   1/1       Running   0          1m
worker3   1/1       Running   0          1m
worker4   1/1       Running   0          1m
worker5   1/1       Running   0          1m

The last thing I was addicted to was name resolution between each pod. Do you usually set up DNS? This time I gave up and wrote in hosts. It's the coolest.

Below you can see the IPs assigned to the pods.

$ kubectl get pods -o=yaml |grep podIP

From here, I launched vi directly and edited it. It's the coolest.

$ kubectl exec -it master vi /etc/hosts

Now that we're finally ready, log in to if and run the script.

$ kubectl exec -it if /bin/bash
# apt-get install git
# cd home
# git clone https://github.com/ashitani/DistributedTensorFlowSample
# cd DistributedTensorFlowSample/gcp/mnist
# python mnist_distributed.py master

bash seems to time out in about 5 minutes. Is it something like exec / bin / bash? Since jupyter notebook is launched in if, I think that you should publish the service and enter from there, but this time it will pass.

If you're a GCP user, you'll probably do it smarter, but I've managed to get it to work, so I'll stop. With 8 vCPUs, it's the limit of motivation (laughs)

Finally

This is the end of the Ichiou Distributed Tensorflow related post trilogy.

This time, I created a cluster on GCP and tried to run the data parallel version of MNIST. Unfortunately, I couldn't experience the detonation velocity so much with the free tier, but if I had the money (laughs), I think I could increase the number of instances and have fun.

If the speed is about x10, I feel that one GPU is enough, but if you are aiming for x100 or a faster area, you will need to use both GPU and cluster. I can't wait for the GPU instance of GCP.

To make it even faster than that, hardware and FPGA will be the immediate future. The other day, Google made a great statement Please make it cheaper because you can lower the reliability of storage. To put it in an extreme way, I think it means taking on the quality control of the storage factory on site. I think that one of the reasons why FPGAs are expensive is the element yield, but since deep neural networks are often configured to withstand local disconnections, they are called FPGAs for deep learning, which are extremely cheap instead of lowering the yield. Isn't it going to hit the market soon? I'm expecting it without permission.

Recommended Posts

Try running Distributed TensorFlow on Google Cloud Platform
From python to running instance on google cloud platform
Try running tensorflow on Docker + anaconda
Try Distributed TensorFlow
Display the weather forecast on M5Stack + Google Cloud Platform
Try StyleGAN on Google Colaboratory
Build an Ubuntu python development environment on Google Cloud Platform
Try running Jupyter Notebook on Mac
Try data parallelism with Distributed TensorFlow
Try using Bash on Windows 10 2 (TensorFlow installation)
Try running PlaidML image judgment on Mac
Try running Kobuki's 3D simulator on ROS
Try running Google Chrome with Python and Selenium
I tried running YOLO v3 on Google Colab
[Google Cloud Platform] Use Google Cloud API using API Client Library
Try running Pyston 0.1
Try running a Django application on an nginx unit
From running MINST on TensorFlow 2.0 to visualization on TensorBoard (2019 edition)
How to connect to Cloud SQL PostgreSQL on Google Cloud Platform from a local environment with Java