[PYTHON] Machine learning Minesweeper with PyTorch

Introduction

After practicing, try using PyTorch to machine-learn your own Minesweeper. There are so many things I don't understand, so I take notes while studying various things. Create a memo once, and if necessary, format it later.

Target

To be able to stably clear the beginner level of Windows standard (was) Minesweeper. For the time being, aim for a winning percentage of about 90%.

Constitution

I copied the DQN of here. It's not enough, so it's easy.

The network uses a sequential model. The number of neurons in the input layer (state $ s ) and output layer ( Q_ {s, a} $) is the number of eyes on the board. There are two hidden layers, and the number of neurons in each layer is the number of eyes on the board x SIZE_MAG. I wonder if I should scale with respect to the number of eyes on the board for the time being (appropriate)

ReLU is used as the activation function, and Adam (learning rate 0.001) is used as the optimization method.

Minesweeper is my own work. I thought I'd do my best by capturing the images, but the main subject is not there. The algorithm is omitted.

The rewards are as follows.

variable conditions
reward_win Game clear
reward_failed Game failure
reward_miss Trying to open a square that is already open

Progress

First time

First, try setting the board size to 6x6 and the number of mines to 5 to see if you can learn.

param


GAMMA = 0.99
NUM_EPISODES = 1000
CAPACITY = 10000
BATCH_SIZE = 200
SIZE_MAG = 8

reward_failed = -100
reward_win = 100
reward_miss = -1

I can win very rarely, but I feel like I'm winning by chance. Looking at the error, it was blown away to about 4 digits in about 2000 steps. Yeah ... Even if you look at the simple reward sum, you try to open only the squares that are already open, and is it a reward problem?

Second time

So fix the reward.

reward


reward_failed = -100
reward_win = 100
reward_miss = -10
reward_open = 1

reward_open is a reward given when you open a new square. The error was calmer than before, but it vibrated all the time at around 10.

nth time

I played around with it, but the vibration and divergence didn't stop. Even if you look at the behavior, it still tries to open the already open square. Fixed target Q-Network will be introduced ...

After one night ...

I considered the following possibilities.

Even if you select a square that is already open as a trial, if the game is over, the error itself will change to less than 1. When the value of ε was reduced (initial value 0.5 → 0.2), the error became even smaller. (About 0.1-0.01) However, since the problem when making a mini batch has not been solved, we will implement Prioritized Experience Replay. The code is as it is

Even if I try, it doesn't work. Well, I wish I had a coding mistake ( Even target Q-Network was still the default value ...

So, as a result of fixing it, it didn't work. Is the reward too big?

reward


reward_failed = -1
reward_win = 1
reward_miss = -1
reward_open = 1

After all it was useless Baby ...

Suddenly

Until now, I've been learning about different boards every time, Is it possible to learn for one board ...? → I was able to do it. About 200 to 300 episodes before the winning percentage reaches 90%. It is cute that the winning percentage will be 90% as soon as you can clear it once.

Then why not change it once to 150 episodes? → I can't win at all. It seems that it is being dragged by past learning data.

Well then, let's go back to change the board every time!

Recommended Posts

Machine learning Minesweeper with PyTorch
Machine learning with Pytorch on Google Colab
Machine learning learned with Pokemon
Machine learning with Python! Preparation
Beginning with Python machine learning
Try machine learning with Kaggle
Machine learning
[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 2)
I tried machine learning with liblinear
Machine learning with python (1) Overall classification
Try machine learning with scikit-learn SVM
[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 1)
Quantum-inspired machine learning with tensor networks
Get started with machine learning with SageMaker
"Scraping & machine learning with Python" Learning memo
Predict power demand with machine learning Part 2
Amplify images for machine learning with python
Machine learning imbalanced data sklearn with k-NN
Machine learning with python (2) Simple regression analysis
Play with PyTorch
A story about machine learning with Kyasuket
Machine Learning with docker (42) Programming PyTorch for Deep Learning By Ian Pointer
[Shakyo] Encounter with Python for machine learning
Cross-validation with PyTorch
Beginning with PyTorch
[Memo] Machine learning
Machine learning classification
Build AI / machine learning environment with Python
Machine Learning sample
PyTorch learning template
[Python] Easy introduction to machine learning with python (SVM)
[Super Introduction to Machine Learning] Learn Pytorch tutorials
Machine learning starting with Python Personal memorandum Part2
Machine learning starting with Python Personal memorandum Part1
Looking back on learning with Azure Machine Learning Studio
[Super Introduction to Machine Learning] Learn Pytorch tutorials
I started machine learning with Python Data preprocessing
Build a Python machine learning environment with a container
Machine learning tutorial summary
Use RTX 3090 with PyTorch
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
About machine learning overfitting
Learning Python with ChemTHEATER 05-1
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Machine learning support vector machine
Learning Python with ChemTHEATER 02
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Install torch-scatter with PyTorch 1.7
Somehow learn machine learning
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
I tried to move machine learning (ObjectDetection) with TouchDesigner
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition