[PYTHON] Explore the maze with reinforcement learning

Introduction

This time, I would like to explore the maze using reinforcement learning, especially Q-learning.

Q learning

Overview

To put it simply, a value called Q value is retained for each pair of "state" and "behavior", and the Q value is updated using "reward" or the like. Actions that are more likely to get a positive reward will converge to a higher Q value. In the maze, the squares in the passage correspond to the state, and moving up, down, left, and right corresponds to the action. In other words, it is necessary to keep the Q value in the memory for the number of squares in the passage * the number of action patterns (4 for up, down, left, and right). Therefore, it cannot be easily adapted when there are many "state" and "action" pairs, that is, when the state and action space explodes.

This time, we will deal with the problem that the number of squares in the aisle is 60 and the number of actions that can be taken is four, about 240 in the vertical and horizontal directions.

algorithm

Update Q value

Initially, all Q values are initialized to 0. The Q value is updated every time the action $ a $ is taken in the state $ s_t $.

Q(s_t, a) \leftarrow Q(s_t, a) + \alpha(r_{t+1} + \gamma \max_{p}{Q(s_{t+1}, p)} -Q(s_t, a))

Action selection

This time we will use ε-greedy. Random actions are selected with a small probability of ε, and actions with the maximum Q value are selected with a probability of 1-ε.

Source code

The code has been uploaded to Github. Do it as python map.py. I wrote it about two years ago, but it's pretty terrible.

Experiment

environment

The experimental environment is as shown in the photo below. The light blue square in the lower right is the goal, the square in the upper left is the start, and the blue squares are the learning agents. When you reach the goal, you will receive a positive reward. Also, the black part is the wall and the agent cannot enter. So the agent has no choice but to go through the white passage. The Q value of each cell is initialized to 0, but when the Q value becomes larger than 0, the largest Q value of the four Q values in that cell is the shade of color, and the action is displayed by an arrow. It is a mechanism.

result

The experimental results are posted on youtube. You can see that the Q value is propagated as the agent reaches the goal.

in conclusion

I want to try Q-learning + neural network

Recommended Posts

Explore the maze with reinforcement learning

See the behavior of drunkenness with reinforcement learning

Reinforcement learning starting with Python

Reinforcement learning 13 Try Mountain_car with ChainerRL.

Reinforcement learning in the shortest time with Keras with OpenAI Gym

Validate the learning model with Pylearn2

[Reinforcement learning] DQN with your own library

[Python] Easy Reinforcement Learning (DQN) with Keras-RL

[Reinforcement learning] Search for the best route

Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.

[Introduction] Reinforcement learning

Future reinforcement learning_2

Future reinforcement learning_1

The story of doing deep learning with TPU

Challenge block breaking with Actor-Critic model reinforcement learning

[Mac] I tried reinforcement learning with OpenAI Baselines

Learning Python with ChemTHEATER 03

"Object-oriented" learning with python

Reinforcement learning 1 Python installation

Learning Python with ChemTHEATER 05-1

Reinforcement learning 3 OpenAI installation

Reinforcement learning for tic-tac-toe

Learning Python with ChemTHEATER 02

Reinforcement learning 37 Make an automatic start with Atari's wrapper

Predict the gender of Twitter users with machine learning

[Reinforcement learning] Bandit task

Learning Python with ChemTHEATER 01

Summary of the basic flow of machine learning with Python

Record of the first machine learning challenge with Keras

Python + Unity Reinforcement Learning (Learning)

I investigated the reinforcement learning algorithm of algorithmic trading

Reinforcement learning 1 introductory edition

I learned the basics of reinforcement learning and played with Cart Pole (implementing simple Q Learning)

Recognize your boss and hide the screen with Deep Learning

I captured the Touhou Project with Deep Learning ... I wanted to.

[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being

Reinforcement learning 23 Create and use your own module with Colaboratory

Let's move word2vec with Chainer and see the learning progress

Reinforcement learning 18 Colaboratory + Acrobat + ChainerRL

Try deep learning with TensorFlow

Reinforcement learning 7 Learning data log output

Insert the debugger with nose

Reinforcement learning 17 Colaboratory + CartPole + ChainerRL

Ensemble learning summary! !! (With implementation)

Reinforcement learning 28 colaboratory + OpenAI + chainerRL

Kill the process with sudo kill -9

Reinforcement learning 2 Installation of chainerrl

[Reinforcement learning] Tracking by multi-agent

Reinforcement learning 20 Colaboratory + Pendulum + ChainerRL

About learning with google colab

Machine learning with Python! Preparation

Try Deep Learning with FPGA

Reinforcement learning 5 Try programming CartPole?

Reinforcement learning 9 ChainerRL magic remodeling

Reinforcement learning Learn from today

Guess the password with klee

Linux fastest learning with AWS

gethostbyaddr () communicates with the outside

Machine learning Minesweeper with PyTorch

scraping the Nikkei 225 with playwright-python

Check the code with flake8