[Python] Easy Reinforcement Learning (DQN) with Keras-RL

Introduction

For those who want to try reinforcement learning, but want to implement the algorithm by themselves ... We will prepare an environment for the original subject and explain the flow of reinforcement learning with keras-rl.

Runtime environment

Library to use

keras

https://github.com/fchollet/keras

pip install keras

It is a deep learning framework that is talked about when it is easy to build a network.

keras-rl https://github.com/matthiasplappert/keras-rl

It is a library that implements deep reinforcement learning algorithms such as DQN using keras. See here for supported algorithms. Clone and install the git repository.

git clone https://github.com/matthiasplappert/keras-rl.git
pip install ./keras-rl

OpenAI gym https://github.com/openai/gym

pip install gym

A library with various environments for reinforcement learning. Keras-rl requires a gym interface for the reinforcement learning environment, so install it. There is a code to learn gym CartPole with DQN in keras-rl example, so give it a try Let's do it.

Building an environment for reinforcement learning

The reinforcement learning environment that keras-rl learns implements Env of OpenAI gym. In the comment of Env of gym to implement, (https://github.com/openai/gym/blob/master/gym/core.py#L27)

    When implementing an environment, override the following methods
    in your subclass:
        _step
        _reset
        _render
        _close
        _configure
        _seed
    And set the following attributes:
        action_space: The Space object corresponding to valid actions
        observation_space: The Space object corresponding to valid observations
        reward_range: A tuple corresponding to the min and max possible rewards

Although it is written, at a minimum, it is OK if the following is implemented.

_step
_reset
action_space
observation_space

This time, we will take a simple example of a point that moves on a straight line, and let's take an example of manipulating the velocity from a random initial position and aiming to reach the origin.

import gym
import gym.spaces
import numpy as np

#Manipulate the speed of points moving on a straight line to reach a goal(origin)Environment whose goal is to move to
class PointOnLine(gym.core.Env):
    def __init__(self):
        self.action_space = gym.spaces.Discrete(3) #Action space. 3 types of speed reduction, as it is, increase

        high = np.array([1.0, 1.0]) #Observation space(state)Dimension(Two dimensions of position and velocity)And their maximum
        self.observation_space = gym.spaces.Box(low=-high, high=high) #The minimum value is minus the maximum value
        
    #Called for each step
    #Implemented to take action and return the next state, reward, and episode
    def _step(self, action):
        #Receive action and determine next state
        dt = 0.1
        acc = (action - 1) * 0.1
        self._vel += acc * dt
        self._vel = max(-1.0,  min(self._vel, 1.0))
        self._pos += self._vel * dt
        self._pos = max(-1.0,  min(self._pos, 1.0))
        
        #Episode ends when the absolute values of position and velocity are small enough
        done = abs(self._pos) < 0.1 and abs(self._vel) < 0.1

        if done:
            #Positive reward when finished
            reward = 1.0
        else:
            #Negative reward over time
            #If you reduce the absolute value as the distance gets closer so that you get closer to the goal, learning will proceed faster.
            reward = -0.01 * abs(self._pos)
            
        #Returns next state, reward, whether finished, additional information
        #Empty dict as there is no additional information
        return np.array([self._pos, self._vel]), reward, done, {}
    
    #Called at the beginning of each episode and implemented to return the initial state
    def _reset(self):
        #Initial state is random position, zero velocity
        self._pos = np.random.rand()*2 - 1
        self._vel = 0.0
        return np.array([self._pos, self._vel])

Building and learning DQN

Refer to dqn_cartpole.py of keras-rl example for the code to build and learn DQN. write.

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

env = PointOnLine()
nb_actions = env.action_space.n

#DQN network definition
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())

#memory for experience replay
memory = SequentialMemory(limit=50000, window_length=1)
#The action policy is orthodox epsilon-greedy. In addition, Boltzmann QPolicy, which determines the probability by the Q value of each action, is available.
policy = EpsGreedyQPolicy(eps=0.1) 
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=100,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

history = dqn.fit(env, nb_steps=50000, visualize=False, verbose=2, nb_max_episode_steps=300)
#If you want to draw the state of learning, in Env_render()Implement and visualize=True,

Drawing tests and results

Test the learned Agent and try drawing the results. Implement Callback to store information for each step (not in keras-rl?), Execute test and plot the result accumulated in Callback.

import rl.callbacks
class EpisodeLogger(rl.callbacks.Callback):
    def __init__(self):
        self.observations = {}
        self.rewards = {}
        self.actions = {}

    def on_episode_begin(self, episode, logs):
        self.observations[episode] = []
        self.rewards[episode] = []
        self.actions[episode] = []
        
    def on_step_end(self, step, logs):
        episode = logs['episode']
        self.observations[episode].append(logs['observation'])
        self.rewards[episode].append(logs['reward'])
        self.actions[episode].append(logs['action'])

cb_ep = EpisodeLogger()
dqn.test(env, nb_episodes=10, visualize=False, callbacks=[cb_ep])


%matplotlib inline
import matplotlib.pyplot as plt

for obs in cb_ep.observations.values():
    plt.plot([o[0] for o in obs])
plt.xlabel("step")
plt.ylabel("pos")
Testing for 10 episodes ...
Episode 1: reward: 0.972, steps: 17
Episode 2: reward: 0.975, steps: 16
Episode 3: reward: 0.832, steps: 44
Episode 4: reward: 0.973, steps: 17
Episode 5: reward: 0.799, steps: 51
Episode 6: reward: 1.000, steps: 1
Episode 7: reward: 0.704, steps: 56
Episode 8: reward: 0.846, steps: 45
Episode 9: reward: 0.667, steps: 63
Episode 10: reward: 0.944, steps: 29

res.png

I was able to learn to go smoothly to position 0.

Recommended Posts

[Python] Easy Reinforcement Learning (DQN) with Keras-RL
Reinforcement learning starting with Python
[Reinforcement learning] DQN with your own library
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Reinforcement learning 1 Python installation
Learning Python with ChemTHEATER 05-1
Learning Python with ChemTHEATER 02
Learning Python with ChemTHEATER 01
Python + Unity Reinforcement Learning (Learning)
[Python] Easy introduction to machine learning with python (SVM)
[Reinforcement learning] Experience Replay is easy with cpprb!
Play with reinforcement learning with MuZero
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Machine learning with Python! Preparation
Easy folder synchronization with Python
Easy Python compilation with NUITKA-Utilities
Easy HTTP server with Python
Beginning with Python machine learning
Python Iteration Learning with Cheminformatics
[Python] Get used to Keras while implementing Reinforcement Learning (DQN)
Easy deep learning web app with NNC and Python + Flask
Reinforcement learning 13 Try Mountain_car with ChainerRL.
[Python] Easy parallel processing with Joblib
Python + Unity Reinforcement learning environment construction
Machine learning with python (1) Overall classification
Input / output with Python (Python learning memo ⑤)
Explore the maze with reinforcement learning
Easy Python + OpenCV programming with Canopy
Perceptron learning experiment learned with Python
Bayesian optimization very easy with Python
Easy data visualization with Python seaborn.
Easy modeling with Blender and Python
python learning
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 2
[Examples of improving Python] Learning Python with Codecademy
Easy keyword extraction with TermExtract for Python
[Python] Super easy test with assert statement
[Python] Easy argument type check with dataclass
[Reinforcement learning] I implemented / explained R2D3 (Keras-RL)
Amplify images for machine learning with python
Easy introduction of speech recognition with Python
[Shakyo] Encounter with Python for machine learning
[Easy Python] Reading Excel files with openpyxl
Easy image processing in Python with Pillow
Data analysis starting with python (data preprocessing-machine learning)
[Easy Python] Reading Excel files with pandas
Easy web scraping with Python and Ruby
Build AI / machine learning environment with Python
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
FizzBuzz with Python3
[Introduction] Reinforcement learning
Scraping with Python
Machine learning starting with Python Personal memorandum Part2
[Python] Learning Note 1
Python is easy
Python learning notes
Statistics with python
Scraping with Python
Python with Go