[PYTHON] Reinforcement learning in the shortest time with Keras with OpenAI Gym

Introduction

Reinforcement learning I'm not sure, but it's for impatient people who want to move and see for the time being because of the theory. In other words, he is a person like me. OpenAI Gym provides an environment for reinforcement learning, so I will use it. OpenAI Gym is just an environment, and you need something else to actually learn. When I looked it up, there was a person who wrote keras-rl that does reinforcement learning with Keras, and it seemed easy to try it, so I used it. I will. Thanks to my ancestors.

Preparing the environment

This environment

At first I did it on a server without a display, but it was annoying, so I did it in a local environment. By the way, even a server without a display seems to be able to do its best with Xvfb. It seems to be the one who reproduces the display on the virtual memory.

Installation

pip install gym
pip install keras-rl

Both installations can be done with pip. It is assumed that keras is included.

CartPole

What is CartPole

CartPole is a game in which a pole is on the cart and the cart is moved to balance it so as not to knock it down (?) is.

This.

Screen Shot 2017-07-23 at 1.44.51.png

The cart can only move left and right. Therefore, there are two values for taking a cart, right and left. Depending on your current environment, choose right or left to get a good balance. This can be confirmed as follows.

import gym
env = gym.make('CartPole-v0')
env.action_space
# Discrete(2)

env.action_space.sample()
# 0

Also, for information about the environment in which the cart can be obtained,

env.observation_space
# Box(4,)

env.observation_space.sample()
# array([  4.68609638e-01, 1.46450285e+38, 8.60908446e-02, 3.05459097e+37])

These four values. In turn, the location of the cart, the speed of the cart, the angle of the pole, and the speed at which the pole rotates. (Kart and Paul are too early, right?) sample()The method is a method for sampling behavior and environment appropriately.

DQN example There is a example that does this with DQN in keras-rl, so use it as it is. I wanted a diagram to write this article, so I've added only two lines. (Where it says Add)

About DQN Get used to Keras while implementing [Python] Reinforcement Learning (DQN) Reinforcement learning from zero to deep The area will be helpful.

It seems that the action value function is a deep neural network. In this case, it is the part of the function that expresses that when the pole is tilted to the right, the action of moving the cart to the right is more valuable.

import numpy as np
import gym
from gym import wrappers #add to

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

ENV_NAME = 'CartPole-v0'

# Get the environment and extract the number of actions.
env = gym.make(ENV_NAME)
env = wrappers.Monitor(env, './CartPole') #add to
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n

# Next, we build a very simple model.
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())

# Finally, we configure and compile our agent. You can use every built-in Keras optimizer and
# even the metrics!
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

# Okay, now it's time to learn something! We visualize the training here for show, but this
# slows down training quite a lot. You can always safely abort the training prematurely using
# Ctrl + C.
dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)

# After training is done, we save the final weights.
dqn.save_weights('dqn_{}_weights.h5f'.format(ENV_NAME), overwrite=True)

# Finally, evaluate our algorithm for 5 episodes.
dqn.test(env, nb_episodes=5, visualize=True)

In this example, the strategy `BoltzmannQPolicy ()` is used, but this is an action according to Future Strengthening Learning. It seems that it is decided by the softmax function of the value of the action value function when selecting. The more action you have, the better you choose.

result

1st episode

openaigym.video.0.43046.video000001.gif

An episode is a learning unit of reinforcement learning, and one episode is until the outcome of the game becomes clear. And since this is the result of the first episode, I haven't learned anything yet and it's completely random.

The cart is moving to the left even though Paul is about to fall to the right.

The reason why it's a little crazy is that the game ends when the CartPole is tilted by 15 degrees or more, so no further drawing is done. Also, it will end even if it moves too much to the left or right.

Episode 216

openaigym.video.0.43046.video000216.gif

Oh ... it's holding up ...

At the end

-Mario Kart I want to learn ...

Recommended Posts

Reinforcement learning in the shortest time with Keras with OpenAI Gym
Explore the maze with reinforcement learning
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
Solve OpenAI Gym Copy-v0 with Sarsa
OpenAI Gym to learn with PD-controlled Cart Pole
Seq2Seq (1) with chainer
Create an OpenAI Gym environment with bash on Windows 10
Use tensorboard with Chainer
Reinforcement learning in the shortest time with Keras with OpenAI Gym
See the behavior of drunkenness with reinforcement learning
[Mac] I tried reinforcement learning with OpenAI Baselines
9 Steps to Become a Machine Learning Expert in the Shortest Time [Completely Free]
Reinforcement learning 3 OpenAI installation
Change the time zone with Docker in Oracle Database
Record of the first machine learning challenge with Keras
Try to make a blackjack strategy by reinforcement learning (② Register the environment in gym)
Play with reinforcement learning with MuZero
[TensorFlow 2 / Keras] How to run learning with CTC Loss in Keras
Reinforcement learning 28 colaboratory + OpenAI + chainerRL
Reinforcement learning starting with Python
[Reinforcement learning] Explanation and implementation of Ape-X in Keras (failure)
[Understand in the shortest time] Python basics for data analysis
Try to make a blackjack strategy by reinforcement learning (③ Reinforcement learning in your own OpenAI Gym environment)
[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being
I tried to describe the traffic in real time with WebSocket
Build a Selenium environment on Amazon Linux 2 in the shortest time
[Reinforcement learning] How to draw OpenAI Gym on Google Corab (2020.6 version)
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Solve OpenAI Gym Copy-v0 with Sarsa
Cython to try in the shortest
Validate the learning model with Pylearn2
Challenge image classification with TensorFlow2 + Keras CNN 1 ~ Move for the time being ~
How to get the date and time difference in seconds with python
Get and convert the current time in the system local timezone with python
I made a GAN with Keras, so I made a video of the learning process.
[Reinforcement learning] DQN with your own library
Behavior when returning in the with block
Display Python 3 in the browser with MAMP
MongoDB for the first time in Python
[Python] Easy Reinforcement Learning (DQN) with Keras-RL
[Reinforcement learning] Search for the best route
Visualize accelerometer information from the microcomputer board in real time with mbed + Python
Try fine tuning (transfer learning), which is the mainstream with images with keras, with data learning
Understand the images of various matrix operations used in Keras (Tensorflow) with examples
What I did when I got stuck in the time limit with lambda python
Turn multiple lists with a for statement at the same time in Python
Introducing the book "Creating a profitable AI with Python" that allows you to learn machine learning in the shortest course