[PYTHON] [Reinforcement Learning] Bakusei: DeepMind Experience Replay Framework Reverb

A Japanese reprint of the blog article written in English.

1.First of all

On May 26th, DeepMind released Reverb as a framework for Experience Replay in reinforcement learning. (Reference)

Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research. Reverb is primarily used as an experience replay system for distributed reinforcement learning algorithms but the system also supports multiple data structure representations such as FIFO, LIFO, and priority queues.

Reverb is an efficient and easy-to-use data storage and data transfer system designed for machine learning research. Reverb is primarily used for experience replay for distributed reinforcement learning algorithms, but the system supports data structures such as FIFOs, LIFOs, and weighted queues. (Up to here translated by the author)

DeepMind's reinforcement learning framework Acme (A research framework for reinforcement learning) uses this Reverb. (For Acme, take another opportunity)

2. Installation

As of June 26th, as of writing the article, Reverb officially states that it only supports Linux-based operating systems and is not at the production use level.

A development version of TensorFlow is required and can be installed from PyPI with the following command

pip install tf-nightly==2.3.0.dev20200604 dm-reverb-nightly

3. Architecture

Reverb uses a server-client approach, and I feel its naming rules are closer to database terminology than other Replay Buffer implementations.

3.1 Server

The sample code on the server side is as follows.

import reverb

server = reverb.Server(tables=[
    reverb.Table(
        name='my_table',
        sampler=reverb.selectors.Uniform(),
        remover=reverb.selectors.Fifo(),
        max_size=100,
        rate_limiter=reverb.rate_limiters.MinSize(1)),
    ],
    port=8000
)

In this example, a normal Replay Buffer with a capacity of 100 (uniform sampling, overwriting from oldest to newest) listens on port 8000. reverb.rate_limiters.MinSize (1) means to block any sampling requests until at least 1 items are in.

3.1.1 Element selection (sampler / remove)

As you can see in the above example, Reverb allows you to specify the element sampling and deletion (overwriting) logic independently.

The logic supported by Reverb is implemented in reverb.selectors and has the following:

3.1.2 Constraint specification (rate_limiter)

The rate_limiter argument can set the conditions for using the Replay Buffer. The conditions supported by Reverb are implemented in reverb.rate_limiters and There are the following

Looking at the comments in the source code, reverb.rate_limiters.Queue and reverb.rate_limiters.Stack are not recommended for direct use, instead the static methods reverb.Table.queue and reverb.Table.queue and It sets sampler, remove, and rate_limiter appropriately so that reverb.Table.stack` is a Replay Buffer with FIFO and LIFO logic, respectively.

3.2 Client

The sample code of the client program is below

import reverb

client = reverb.Client('localhost:8000') #If the server and client are the same machine

# [0,1]State(observation)Priority 1.Example of putting in Replay Buffer with 0
client.insert([0, 1], priorities={'my_table': 1.0})

#After sampling, the generator is returned
client.sample('my_table', num_samples=2))

3.3 Save / Load

Reverb supports data save / load functions. By executing the following code from the client, the data in the current server is saved on the file and the saved file path can be obtained.

checkpoint_path = client.checkpoint()

The state of the original data can be restored by creating a server using the saved data. It should be noted that the tables argument of the constructor does not ** specify exactly the same as the original server that stored the data ** at your own risk.

checkpointer = reverb.checkpointers.DefaultCheckpointer(path=checkpoint_path)
server = reverb.Server(tables=[...], checkpointer=checkpointer)

Finally

DeepMind's new framework for Experience Replay, Reverb, hasn't reached a stable version yet, but I felt it was promising for flexible and large-scale reinforcement learning.

For my Experience Replay library cpprb, a huge rival suddenly emerged, but in smaller reinforcement learning experiments, cpprb I think there are some parts that are more convenient and easier to use. (See Past Qiita Articles)

(Update: 2020.6.29) I researched and wrote How to use the client!

Recommended Posts

[Reinforcement Learning] Bakusei: DeepMind Experience Replay Framework Reverb
[Reinforcement learning] DeepMind Experience Replay library Reverb usage survey [Client edition]
DeepMind Reinforcement Learning Framework Acme
[Reinforcement learning] Experience Replay is easy with cpprb!
[Introduction] Reinforcement learning
Future reinforcement learning_2