Introduction

ThinkPad X260 (Core i7-6500 2.5GHz (2core 4thread)) when I wrote Predicting FX with LSTM using Keras + Tensorflow It took a long time to calculate with. In deep learning, there are many parameters, and even if you try to modify them in various ways, it will not remain as it takes so long. In addition, there are multiple currency pairs in Forex, and it is not enough to have only one currency pair, and it is expected that verification will take more time.

However, it can be accelerated using GPUs, whether it's Tensorflow or Keras. It's only natural to run deep learning on the GPU (this flow can change, as we'll see later). So I decided to use it with Geforce GTX 1070.

Hard (2017/6/5 7:00 added)

Actually, I have already purchased it (I bought it to try TensorFlow on Ubuntu), but the hardware has the following configuration.

・ Geforce GTX 1070 (8GB memory) ・ Core i5-6500 3.2Ghz (4core 4Thread) ・ OS: Windows 10 64bit ・ Memory: 8GB

Since the main thing is to calculate with the GPU, the GPU part is rich, but I thought that the CPU would not be necessary anyway (I will explain later, but if this fails ...). I'm omitting storage, but I'm using SSD (the amount of data is too small to do anything).

Also, I chose 1070, which has 8GB, because I want to put a lot of memory on the GPU. I think you can make it even faster by choosing 1080 or 1080 Ti, but I couldn't invest because I didn't know how much GPU power I needed. Because the power supply is at the last minute level. It would be nice if it could be used for other purposes as well, but I couldn't invest too much because I only plan to use it for deep learning. For trial purposes, 1060 with 6GB of memory may be good, but ...

The reason I chose Windows as the OS is because I want to use a similar environment elsewhere. I've already tried TensorFlow on Ubuntu (I wasn't using Keras at the time), but it was pretty hard to deploy.

I also wanted the experience to be able to install it immediately when I purchased a notebook PC (Windows) equipped with a GPU in the future.

Preparation

First is the introduction of the GPU version of Tensorflow. I would like to say that it is not so difficult if you refer to install of TensorFlow, but I got two.

The first is the installation of scipy. Basically, I couldn't install it with pip. I brought scipy-0.19.0-cp35-cp35m-win_amd64.whl from Unofficial Windows Binaries for Python Extension Packages. You can bring this and install it by specifying the file name as follows.

# pip install scipy-0.19.0-cp35-cp35m-win_amd64.whl

The second is the VS library. Actually, it was here until the week before last (around May 28, 2017), but why not? ..I'll look it up.

Actually, I had prepared a GPU environment for TensorFlow on Ubuntu before, but it was subtle which one was easier. Neither can be done smoothly. The Untu environment may be easier now.

It's a pity that there are no detailed steps on the site such as placing the NVIDIA cuDNN library in the path.

However, it takes more time than the calculation on the CPU, so I have to actually introduce the GPU.

Source

Although it is a source, it is already registered on github. git clone https://github.com/rakichiki/keras_fx.git and it will be keras_fx_gpu.ipynb. It may be a bug fix than before. Although the file name has GPU, it works even in a CPU-only environment. If you start this with jupyter and upload it, it will work.

The following parts of keras_fx_gpu.ipynb have changed (I referred to it from somewhere, but I forgot ... I'm sorry I didn't link to the person who referred it).

`Memory usage setting`


import tensorflow as tf
from keras import backend as K

config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)

As mentioned in Execution and Discussion, GPUs have less memory than the main body. For example, the Geforce GTX 1070 has 8GB of memory. It's okay to have 16GB of memory on a development PC these days (my main ThinkPad X260 has 16GB).

However, deep learning consumes a lot of memory, and if not specified, it may take up a lot of memory first. For this reason, I think that various problems may occur unless you first write that you will use as much as you use as described above.

Previously, when using TensorFlow on the GPU on Linux, when trying to calculate in parallel, the GPU memory was insufficient and new tasks could not be started. Is the use planned?

Execution (speed only) and consideration (2017/06/06 07:15 major correction)

Let's run it. However, this time, I am not aiming to increase the profit of Forex, but I am keeping in mind how much time can be shortened to calculate 9 currency pairs. I don't know how long it will take to see the result even if I tune or modify the source.

Then the result is as follows. It gives the highest number in the calculation for GPU load and memory transfer bandwidth.

environment	Number of currency pairs	time(Minutes)	GPU load(%)	GPU memory transfer bandwidth(%)	GPU memory usage(MB)
Core i7-6500 2.5GHz (2core 4thread)	1	27.9	-	-	-
Core i7-6500 2.5GHz (2core 4thread)	2	48.8	-	-	-
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	1	3.1	50	15	526
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	2	4.4	72	20	793
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	3	5.5	75	22	1,069
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	4	7.0	76	23	1,345
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	5	9.0	76	23	1,620
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	6	11.1	77	23	1,891
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	7	12.6	76	23	2,079
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	8	14.7	77	23	2,355
Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)	9	16.5	76	23	2,664

ThinkPad X260 (Core i7-6500 2.5GHz (2core 4thread)) and desktop PC (Geforce GTX 1070 & Core i5-6500 3.2Ghz (4core 4Thread)).

The number of currency pairs is the number of notes opened and executed at the same time. The CPU takes too long so I've only tried up to 2 currency pairs.

First of all, when comparing one currency pair, there is a difference of nearly 9 times between CPU and GPU. It is close to 11 times with 2 currency pairs. If the speed difference is about an order of magnitude different, I think that it is a number that will motivate you to take on the challenge even if there are major issues in the transition. Also, I gave up because it seemed that it would take too long to measure 9 currency pairs with the CPU (it will be over 3 hours).

Depending on what you are calculating, heavy processing will make a bigger difference. However, as a guide, you may think that it is foolish to do this kind of calculation with the CPU.

*: Reconsider the part using GPU (If you think about it, you have logged the information related to GPU. If you look at it, you can explain it a little more easily.)

If you look at this number, you can see that the GPU does not use the entire GPU in one currency pair.

However, it can be seen that GPUs are not being used any more, with 3 currency pairs or more moving around 75%.

However, I can't see the numbers that are hampered by the memory transfer bandwidth and memory usage (up to 8GB). Although the temperature of the GPU is not stated, it does not mean that the thermal stop is done because even 9 currency pairs have reached only 70 degrees.

I took a look at the situation of 9 currency pairs here.

The CPU has reached 100%. I thought that the GPU version of TensorFlow / Keras only calculates GPU, but that's not the case. Is it the calculation of loss etc.?

With the introduction of the GPU, there is a prospect of speeding up to a certain extent, but there are some obstacles and it cannot be said that it can be fully used.

When I bought this PC, I decided on the configuration because I thought, "Because it is calculated by the GPU, the CPU should have a stingy configuration", but it seems that it was a failure.

Even so, it is possible to calculate 9 currency pairs at once. Well, I'm likely to be told to put an Early Stopping before that ...

Impressions

The result was that the CPU became a bottleneck and the calculation speed did not increase unexpectedly, but it seems that it is possible to increase the speed by about an order of magnitude depending on the use of the GPU.

However, it is assumed that GPUs are no good in this world, and each company is taking measures.

Goole has created a dedicated TPU, NVIDIA has installed TensorUnit on the latest GPU, Qualcomm has Snapdragon 835 DSP of Tensorflow It is supported (However, TensorFlow does not work other than CPU and CUDA, so I do not understand Qualcomm's argument ...), but it is possible to calculate with FP16 and INT8 with Vega, the latest GPU of AMD I am trying to improve the detailed computing power. It seems that each company is putting more effort into speeding up deep learning than it is now.

However, neither Google's TPU nor Snapdragon 835 will be on sale to the general public, GPUs with Tensor Units are not yet on sale (no, I wouldn't be able to buy this if they were), and AMD's Vega is still not on sale. , TensorFlow still only supports CUDA so I can't use it ...

It seems that it will be more difficult to improve the performance more easily than now, but I hope that this area will improve a little around next year (2018).

Finally

For the time being, we have created a working source and an environment where you can easily try it. Now you can finally start. From now on, I will challenge the results little by little.

[PYTHON] Try to predict FX with LSTM using Keras + Tensorflow Part 2 (Calculate with GPU)