Introduction

There is a more generalized method of AlphaZero called MuZero announced by DeepMind. It is very powerful that it can be applied not only to interpersonal games with clear rules but also to single-player games such as Atari games, and it seems that the performance is quite high. I haven't even existed for about a year (because I personally took childcare leave) (...), and recently I've been in the news again and I finally got to know it, so I've been playing around with it lately. I will share it.

I've already published a repository of very nice PyTorch-based implementations called muzero-general, so I'll mainly introduce them.

muzero-general

Good point

-Various games are included from the beginning, and it is very easy to add by yourself (just add one file) --Easy to adjust Hyperparameters --If you can use GPU, it will be used --You can see the status of Reward acquisition, Loss transition, and learning & SelfPlay digestion speed in real time on TensorBoard. --Easy to start using --The source code is also easy to read

How to introduce & try

It's good to see Getting started, but you can start like this.

git clone https://github.com/werner-duvaud/muzero-general.git
cd muzero-general

pip install -r requirements.txt

python muzero.py  
# ->Menu is displayed
# ->Game type selection
# ->what will you do: Training, LoadModel, TestPlay(MuZero vs Human), ViewPlay(MuZero Vs MuZero),  etc

At first, I think it would be better to look at the "Marubatsu game" in Japanese, which is called tictactoe.

How to add a game

Just add the files to the games / directory. We will implement methods like reset, step, to_play, legal_actions in a class called Game. There are a number of Game implementations in the directory that you can use as a reference, so it's easy to understand what to do.

Patch for two-player battle games

Maybe there is a bug in the implementation of the two-player game and if it doesn't work, try applying this Pull Request. At first, tictactoe remained weak even after half a day, but after applying this PR, it became a satisfactory strength in a few hours.

A little game

We have added common games to the game app on your smartphone, so if you are interested, please.

2048

--File: make2048.py

X2 Blocks

--File: x2blocks.py

at the end

I feel like I have a very fun toy. If you learn connect4 or 1 to 2 days, it will become quite strong. If you do it properly, you can do it with an unpleasant Trap ...

[PYTHON] Play with reinforcement learning with MuZero