[PYTHON] Try data parallelism with Distributed TensorFlow

What is data parallelism?

Last time did from build of Distributed TensorFlow to model parallelism, but this time I will try learning by data parallelism. There are two types of parallelization, model parallelism and data parallelism, but roughly speaking, they are as follows.

--Model parallelism: One huge operation for 1000 data is shared by 100 people --Data parallelism: Divide 10 pieces of data per person and share it with 100 people

Of course, model parallelism depends on the model, so it can be said that it is more versatile to reduce the data handled at one time in data parallelism.

Parameter sharing

In data parallelism in training, multiple copies of models with the same parameters are made, batches are subdivided and passed to each copy of the model, and each is made to calculate the derivative. In other words, it is necessary to have a model with the same parameters for each device, but the handling around that is a little difficult to understand. I will not use GPU this time, but for calculation and parameter sharing on multiple devices, the official How To description around GPU (Using GPUs .7 / how_tos / using_gpu / index.html # using-gpus), Sharing Variables) Will be helpful.

You can use tf.variable_scope () to define the scope of a variable. If you want to use variables with the same name in the same scope, you can call get_variable () with the reuse flag set. get_variable () behaves like returning a new creation if the reuse flag is not set, and returning a link to an existing variable with the same name if it is set. You can use this to share parameters.

Use Collections to refer to the graph later.

Cluster configuration

In Distrubuted Tensorflow White Paper, the server that manages Variable, which is a parameter device, manages and updates parameters, and the master works for each worker. There is a description about throwing (sorry, I haven't read it properly ...). While reading diagonally around that, I decided to manage Variables collectively with master, and make it a configuration that two workers handle the subdivided batches. It is possible to separate the server (ps) for parameters separately, but this time it will be included in master.

--Create a ps scoped variable on the master device --Reuse ps scoped variables to calculate derivatives on each worker device --On the master device, update the parameters using the average of the derivatives calculated by each worker

I will describe it in the division. The figure is as follows.


Try to move

Start the grpc server. The cluster configuration is one master and two workers, so

grpc_tensorflow_server --cluster_spec='master|localhost:2222,worker|localhost:2223,worker_|localhost:2224' --job_name=master --task_index=0 &
grpc_tensorflow_server --cluster_spec='master|localhost:2222,worker|localhost:2223,worker_|localhost:2224' --job_name=worker --task_index=0 &
grpc_tensorflow_server --cluster_spec='master|localhost:2222,worker|localhost:2223,worker_|localhost:2224' --job_name=worker_ --task_index=0 &

Is it?

Last time Let's learn the approximation of $ y = e ^ x $ in parallel data. See here for the code. By the way, I also put a single CPU version and a model parallel version in the same place.

To share parameters with master and worker, just specify the device and scope and replace the variable creation part created by tf.Variable () with get_variable (). I want to reuse it, so I will unify everything with the ps scope. It seems to pass an initializer to initialize the variable.

W1 = tf.Variable(tf.random_uniform([1,16], 0.0, 1.0))  # before
W1 = tf.get_variable("W1",shape=[1,16],initializer=tf.random_uniform_initializer(0,1)) # after

If you call it with the reuse flag set in the same scope, it will be reused. I'm a little confused here, but since the scope here is a variable scope, only variables are reused, and the graph is a separate instance.

We will pass the batch to each worker later in the main loop, but we will use a collection so that we can identify which worker's placefolder at that time.

tf.add_to_collection("x",x)  #Collect x for later use
x0= tf.get_collection("x")[0] #Extract the 0th of the x collection

Similarly, collect the derivatives.

Optimization is a procedure of cost calculation-differential calculation-parameter update, but I think that usually optimizer.minimize () etc. is used to perform differential calculation and parameter update at once. However, this time we will not use the obtained derivative immediately, so apply_gradiends () after compute_gradients (). As a whole process

--Differential operation compute_gradients () in each worker --Average those derivatives with master --App_gradients () using the average derivative

Follow the procedure. The place to average the differentiation is diverted from TensorFlow sample code.

Let's compare the degree of convergence.


Blue is the single CPU version and red is the data parallel version. Since the seeds of random numbers are aligned, they almost overlap. (Not exactly the same value)

I haven't increased the number of server machines, the overheads are huge, the batch operation is not heavy in the first place, etc. There are various disadvantages in this example, so the parallel version is slower. It was roughly twice as slow at hand.


Up to this point, the purpose is to understand the mechanism rather than speeding up, so every time I get the result of slowing down with parallelization, Next time is finally about this. Let's make it a Docker container and run it on Google Clout Platform. If you make a lot of containers with the momentum to use up the free frame of \ $ 300, it should be faster. The hot topic AlphaGo is said to have 1200 CPUs, but I'm looking forward to seeing how much it can do for \ $ 300.

Recommended Posts

Try data parallelism with Distributed TensorFlow
Try Distributed TensorFlow
Learn data distributed with TensorFlow Y = 2X
Reading data with TensorFlow
Try regression with TensorFlow
Try deep learning with TensorFlow
Try TensorFlow MNIST with RNN
Try working with binary data in Python
Try converting to tidy data with pandas
Try TensorFlow RNN with a basic model
Zundokokiyoshi with TensorFlow
Breakout with Tensorflow
Try to aggregate doujin music data with pandas
Try Tensorflow with a GPU instance on AWS
Try running Distributed TensorFlow on Google Cloud Platform
Try Theano with Kaggle's MNIST Data ~ Logistic Regression ~
Try scraping with Python.
Visualize data with Streamlit
Data visualization with pandas
Data manipulation with Pandas!
Challenge image classification with TensorFlow2 + Keras 3 ~ Visualize MNIST data ~
Shuffle data with pandas
Data Augmentation with openCV
Normarize data with Scipy
Data analysis with Python
LOAD DATA with PyMysql
Try scraping the data of COVID-19 in Tokyo with Python
Try to extract Azure SQL Server data table with pyodbc
Try to get data while port forwarding to RDS with anaconda.
Try to process Titanic data with preprocessing library DataLiner (Encoding)
Try to process Titanic data with preprocessing library DataLiner (conversion)
Try to extract the features of the sensor data with CNN
Translate Getting Started With TensorFlow
Sample data created with python
Try to factorial with recursion
Try function optimization with Optuna
Embed audio data with Jupyter
Graph Excel data with matplotlib (1)
Use TensorFlow with Intellij IDEA
Artificial data generation with numpy
Try using PythonTex with Texpad.
Try edge detection with OpenCV
TensorFlow Tutorial-MNIST Data Download (Translation)
Try implementing RBM with chainer.
Try Google Mock with C
Tuning experiment of Tensorflow data
Try using matplotlib with PyCharm
Try GUI programming with Hy
Try an autoencoder with Pytorch
Try Python output with Haxe 3.2
Try matrix operation with NumPy
Get Youtube data with python
Try running CNN with ChainerRL
Try various things with PhantomJS
Clustering ID-POS data with LDA
Learn new data with PaintsChainer
Binarize photo data with OpenCV
Jetson Nano JETPACK 44.1 (2020/10/21) with Tensorflow
Easy image classification with TensorFlow
Try "100 knocks on data science" ①
Graph Excel data with matplotlib (2)