This article is a continuation of Previous article. Last time, I introduced Nanchatte DQN (incomplete version), which can be implemented only with plain Tensorflow (as of July 2016), but here [Mnih et al.'S 2015 paper] 1 I'm writing about how to faithfully reproduce the method actually implemented in.
In particular, 1 is a problem, and it is not implemented in plain Tensorflow, so you need to implement it yourself. Here, we will introduce how to implement it in Tensorflow and the results obtained from it.
To create a new Optimizer in Tensorflow, you need to tweak the Python and Cpp code in Tensorflow. Specifically, modify the following files or add files. Basically, if you find an existing RMS Drop implementation, copy and modify it, and modify it, the implementation itself is not that difficult.
(Python) 1.tensorflow/contrib/layers/python/layers/optimizers.py 2.tensorflow/python/training/training_ops.py 3.tensorflow / python / training / rmspropgraves.py (new) 4.tensorflow / python / training / rmspropgraves_test.py (Not required, but you should test it, so prepare it)
(Cpp side) 1.tensorflow/core/kernels/training_ops.cc 2.tensorflow/core/kernels/training_ops_gpu.cu.cc 3.tensorflow/core/ops/training_ops.cc
I have uploaded the implemented code to [GitHub] 3, so please take a look there for the specific implementation.
It is written in [Tensorflow official page] 4, but I was a little addicted to it, so I will wake up with specific steps. The environment will be Ubuntu 14.04.
It's taken from GitHub. Then check out the release branch (here r0.9) as the master branch was unstable and could fail to build.
$ git clone https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ git checkout --track origin/r0.9
Tensorflow builds using bazel, so install it first. [Method described on Bazel's official page] Install with 5. The Tensorflow page is confusing, so it's best to go through it lightly.
This is OK if you do it as officially.
$ sudo apt-get install python-numpy swig python-dev python-wheel
Execute Configure in the root directory of tensorflow (the same as the root of the repository dropped on GitHub) to set the environment.
$ ./configure
(Abbreviation)
You will be asked variously on the way, but if you want to set up the GPU environment, you should be able to go by hitting basic y and Enter repeatedly. In the case of CPU only environment, not GPU environment,
Do you wish to build TensorFlow with GPU support? [y/N] n
As, disable GPU support.
It's a little more to come to this point, but if you follow the formula, you will be forced to build a trainer that you do not understand well, so be careful. Below, create and install whl for pip installation.
(In case of CPU environment)
$ bazel build -c opt //tensorflow/tools/pip_package:build_pip_package
(For GPU environment)
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ sudo pip install /tmp/tensorflow_pkg/(The name depends on which branch you built)
The code has been posted on [GitHub] 6 again, so if you are interested, please take a look there. This section describes the key implementation of Loss Clipping.
LossClipping is an operation to fix the slope to 1 when the error exceeds 1, but this time, using the property that the straight line with slope 1 becomes 1 when differentiated, Clipping is realized by creating a function that makes the error linear when it exceeds 1. Specifically, do as follows. The implementation here has been slightly modified with reference to [here] 7.
def clipped_loss(self, data, target, q_value_filter):
  filtered_qs = self.filtered_q_values(data, q_value_filter)
  error = tf.abs(target - filtered_qs)
  quadratic = tf.clip_by_value(error, 0.0, 1.0)
  linear = error - quadratic
  return tf.reduce_sum(0.5 * tf.square(quadratic) + linear)
The difference from the implementations found on other nets is that they are summed rather than averaged. Looking at Nature's paper, I didn't find a clear statement that I had to average, and averaging had no merit as far as I could experiment at hand, but rather only significantly delayed convergence.
If you implement it so that it takes an average, it will have the effect of substantially reducing the learning rate, so it may lead to a higher score when learning progresses considerably, but atari games with DQN As far as I do, I don't think it will be effective personally.
Except for changing ReplayMemory from 1 million to 400,000, the parameters are the same, and the result of 1.5 million iterations is posted.
DQN with tensorflow 2 pic.twitter.com/JRwM0MsTDG
— EpsilonCode (@epsilon_code) August 7, 2016
The video has different learning parameters and number of iterations from the incomplete version, so the incomplete version may actually have the same performance. You might think that, but when I tried it, that was not the case, and this setting was overwhelmingly better in performance. The code is on GitHub, so give it a try.
V.Mnih et al. Human-level control through deep reinforcement learning A.Graves Generating Sequences With Recurrent Neural Networks [Implement DQN with Keras, TensorFlow and OpenAI Gym] 6
Recommended Posts