[PYTHON] Play with reinforcement learning with MuZero


There is a more generalized method of AlphaZero called MuZero announced by DeepMind. It is very powerful that it can be applied not only to interpersonal games with clear rules but also to single-player games such as Atari games, and it seems that the performance is quite high. I haven't even existed for about a year (because I personally took childcare leave) (...), and recently I've been in the news again and I finally got to know it, so I've been playing around with it lately. I will share it.

I've already published a repository of very nice PyTorch-based implementations called muzero-general, so I'll mainly introduce them.


Good point

-Various games are included from the beginning, and it is very easy to add by yourself (just add one file) --Easy to adjust Hyperparameters --If you can use GPU, it will be used --You can see the status of Reward acquisition, Loss transition, and learning & SelfPlay digestion speed in real time on TensorBoard. --Easy to start using --The source code is also easy to read

How to introduce & try

It's good to see Getting started, but you can start like this.

git clone https://github.com/werner-duvaud/muzero-general.git
cd muzero-general

pip install -r requirements.txt

python muzero.py  
# ->Menu is displayed
# ->Game type selection
# ->what will you do: Training, LoadModel, TestPlay(MuZero vs Human), ViewPlay(MuZero Vs MuZero),  etc

At first, I think it would be better to look at the "Marubatsu game" in Japanese, which is called tictactoe.

How to add a game

Just add the files to the games / directory. We will implement methods like reset, step, to_play, legal_actions in a class called Game. There are a number of Game implementations in the directory that you can use as a reference, so it's easy to understand what to do.

Patch for two-player battle games

Maybe there is a bug in the implementation of the two-player game and if it doesn't work, try applying this Pull Request. At first, tictactoe remained weak even after half a day, but after applying this PR, it became a satisfactory strength in a few hours.

A little game

We have added common games to the game app on your smartphone, so if you are interested, please.


--File: make2048.py

X2 Blocks

--File: x2blocks.py

at the end

I feel like I have a very fun toy. If you learn connect4 or 1 to 2 days, it will become quite strong. If you do it properly, you can do it with an unpleasant Trap ...

Recommended Posts

Play with reinforcement learning with MuZero
Reinforcement learning starting with Python
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Explore the maze with reinforcement learning
Play with Prophet
[Introduction] Reinforcement learning
[Reinforcement learning] DQN with your own library
Play with PyTorch
Play with 2016-Python
Future reinforcement learning_2
Future reinforcement learning_1
Play with CentOS 8
Play with Pyramid
[Python] Easy Reinforcement Learning (DQN) with Keras-RL
Play with Fathom
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
See the behavior of drunkenness with reinforcement learning
[Reinforcement learning] Experience Replay is easy with cpprb!
Challenge block breaking with Actor-Critic model reinforcement learning
[Mac] I tried reinforcement learning with OpenAI Baselines
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Reinforcement learning 1 Python installation
Learning Python with ChemTHEATER 05-1
Reinforcement learning 3 OpenAI installation
Play with Othello (Reversi)
Reinforcement learning for tic-tac-toe
Learning Python with ChemTHEATER 02
[Reinforcement learning] Bandit task
Learning Python with ChemTHEATER 01
Python + Unity Reinforcement Learning (Learning)
Reinforcement learning 1 introductory edition
I want to climb a mountain with reinforcement learning
Reinforcement learning 37 Make an automatic start with Atari's wrapper
Reinforcement learning 18 Colaboratory + Acrobat + ChainerRL
Let's play with 4D 4th
Let's play with Amedas data-Part 1
Try deep learning with TensorFlow
Reinforcement learning 7 Learning data log output
Play with push notifications with imap4lib
Reinforcement learning 17 Colaboratory + CartPole + ChainerRL
Ensemble learning summary! !! (With implementation)
Reinforcement learning 28 colaboratory + OpenAI + chainerRL
Reinforcement learning in the shortest time with Keras with OpenAI Gym
Play around with Linux partitions
Reinforcement learning 19 Colaboratory + Mountain_car + ChainerRL
Reinforcement learning 2 Installation of chainerrl
[Reinforcement learning] Tracking by multi-agent
Reinforcement learning 6 First Chainer RL
Reinforcement learning 20 Colaboratory + Pendulum + ChainerRL
About learning with google colab
Machine learning with Python! Preparation
Deep Kernel Learning with Pyro
Let's play with Amedas data-Part 4
Try Deep Learning with FPGA
Reinforcement learning 5 Try programming CartPole?
Reinforcement learning 9 ChainerRL magic remodeling
Reinforcement learning Learn from today
Play with Jupyter Notebook (IPython Notebook)
[Python] Play with Discord's Webhook.
Linux fastest learning with AWS