[PYTHON] Chainer 2.0 will be released soon It seems that another Chainer that supports large-scale distributed processing will be released

From the article released on Nikkei IT Pro 2017/01/27 PFN's deep learning framework "Chainer" supports distributed processing for significantly faster speed

For the past year and a half, I've been using Chainer for individuals (learning and a little research), and for the last year or so, I've been using TensorFlow for business projects.

I often wondered what the difference between the two was, but what made Chainer inferior was that it didn't support distributed processing on multiple nodes. That said, it's unlikely that you'll be exposed to a large dataset that requires distributed processing for projects involving bottom freelancers.

Seems to support at least 32 nodes / 128 GPUs

(Reprinted from the above link)

Learning that took 20 days or more with 1 node / 1 GPU was shortened to 4.4 hours with 32 nodes / 128 GPU. Even if "20 days or more" is calculated as 20 days, it is actually 109 times faster.

In addition, the fact that 128 GPUs achieved 109 times the performance of 1 GPU means that ** effective efficiency has reached 85.22% **, achieving ** high efficiency that can be said to be abnormal **.

Although it is a self-proclaimed bottom freelancer, it is a field that is good at large-scale distributed processing, so even if the data relevance is ** sparse **, it is difficult to achieve 85% efficiency. Moreover, it is no exaggeration to say that achieving this efficiency with machine learning, which is relatively closely related, is a feat.

As an aside, in distributed processing systems, we often use the term "4 for 3", which means that we aim to achieve the performance of 3 for 4 units. In other words, the effective efficiency is 75%. However, this 75% is just a target value, and in reality it is not so easy to achieve.

With MXNet and CNTK, the effective effect is about 40 to 50%, which I think is normal, but I'm sure that the distributed processing compatible version of Chainer uses great technology.

Forward will be more efficient with DataParallel, but Backward will be less efficient as it will require synchronous processing before the loss calculation. After the loss calculation, backpropagation by Optimizer is performed, but is this also executed in a distributed manner, or is it a method of processing and updating Weight etc. on one node and delivering it to each node ...

In addition, the performance is more than 5 times that of TensorFlow, so it is expected to be adopted in large-scale projects.

By the way, MPI and InfiniBand are used as the connection technology between distributed nodes. As mentioned in the ITPro article, it is completely supercomputer technology. What a great deal with Protocol Buffers and 10GbE ... (laughs)

Also, in the opinion of "ordinary people" like us, I highly expect that the speed will be increased even in an environment such as 1 node / 4 GPU.

Chainer 2.0 will be released in the near future

It is written on the second page that can only be read by IT Pro members

PFN will release "Chainer 2.0" soon, but it's not a distributed version of Chainer. And that.

Unfortunately, the IT Pro article didn't give any details about Chainer 2.0. I'm wondering when "coming soon" is concrete.

In Article I wrote the other day, I told you that TensorFlow 1.0 will be released soon, but we have worked hard together and it has become a better framework. I want you to. (Other power application)

I look forward to the follow-up report.