[PYTHON] Deep Learning with Shogi AI on Mac and Google Colab Chapter 10 6-9

TOP PAGE

1 Hand search AI (search1_player.py)

Location of legal hand filtering

The location of legal hand filtering has come before the policy network. My goal is? Reduced memory usage? Note that the meaning of variables may be different from that of the policy network because of that. (Legal_logits and logits, etc.)

features.append(make_input_features_from_board(self.board)) Output of make_input_features_from_board: Where the first piece is, the first piece, the second piece, the second piece [(9x9 matrix), (9x9 matrix), ... (18 + 4 + 4 + 4 + 4 + 2 + 2), (9x9 matrix), (9x9 matrix), ... (18 + 4 + 4 + 4 + 4 + 2 + 2)] This array is added to features as many as legal moves. [[This array], [This array], ..., [This array]]. The number of elements is the number of legal hands because it is carried out after filtering legal hands. For example, if you are a beginner, the number of elements is 30.

y.data An example of y.data

[[-0.04460792]
 [ 0.02167853]
 [ 0.04610606]
・ ・ ・
 [-0.09904062]]

y.data.reshape(-1)

 [-0.04460792  0.02167853  0.04610606 -0.10492548 -0.22675163 -0.23193529
  -0.06671577  0.02509898 -0.02109829 -0.05519588 -0.05578787 -0.03609923
  -0.11021192 -0.10877373 -0.04065045 -0.01540023 -0.0336022  -0.03805592
   -0.03325626 -0.02194545 -0.08399387 -0.13204134 -0.2106831  -0.24970257
   -0.18735377 -0.08184412 -0.15573277 -0.00548664 -0.0353202  -0.09904062]

The number of elements is the number of legal hands because it is carried out after filtering legal hands. The above is an example of printing at the first stage. The first move is 30 legal moves, so there are 30 elements.

for i, move in enumerate(legal_moves): enumerate returns index and value Policy The network used make_output_label to get the index. The principle meaning is easier to understand. However, there are many descriptions. The value network uses enumerate to get the index. The principle meaning is difficult to understand, but the description is easy. It seems that the meaning of what you are doing is the same.

python-dlshogi\pydlshogi\player\search1_player.py


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

#Environmental setting
#-----------------------------
import socket
host = socket.gethostname()
#Get an IP address
# google colab  :random
# iMac          : xxxxxxxx
# Lenovo        : yyyyyyyy

# env
# 0: google colab
# 1: iMac (no GPU)
# 2: Lenovo (no GPU)

# gpu_en
# 0: disable
# 1: enable

if host == 'xxxxxxxx':
    env = 1
    gpu_en = 0
elif host == 'yyyyyyyy':
    env = 2
    gpu_en = 0
else:
    env = 0
    gpu_en = 1

#strategy
# 'greedy':Greedy Strategy
# 'boltzmann':Softmax strategy

algorithm ='boltzmann'

#-----------------------------

import numpy as np
import chainer
from chainer import serializers
import chainer.functions as F
if gpu_en == 1:
    from chainer import cuda, Variable

import shogi

from pydlshogi.common import *
from pydlshogi.features import *
from pydlshogi.network.value import *
from pydlshogi.player.base_player import *

def greedy(logits): #Returns the index of the element with the maximum value among the elements of the list specified in the argument
                    #In a neural network, logits are the values before passing through the activation function.
    return np.argmax(logits)
    #Logits in the policy network.index(max(logits))Was Same meaning. Are you simplifying the description little by little?

def boltzmann(logits, temperature):
    logits /= temperature # a /=b is a= a /Meaning of b
    logits -= logits.max() # a -=b is a= a -The meaning of b. It will be a negative value. The maximum value is 0.
    probabilities = np.exp(logits) # x =<0 exp function
    probabilities /= probabilities.sum()
    return np.random.choice(len(logits), p=probabilities) # choice(i, p=b)Is 0 to i-Randomly returns numbers up to 1 with a probability of b

class Search1Player(BasePlayer):
    def __init__(self):
        super().__init__()
        if env == 0:
            self.modelfile = '/content/drive/My Drive/・ ・ ・/python-dlshogi/model/model_value'
        elif env == 1:
            self.modelfile = r'/Users/・ ・ ・/python-dlshogi/model/model_value' #Value network model created by learning
        elif env == 2:
            self.modelfile = r"C:\Users\・ ・ ・\python-dlshogi\model\model_value"
        self.model = None

    def usi(self): #GUI software side: Send USI command after startup. USI side: Returns id (and option) and usiok.
        print('id name search1_player')
        print('option name modelfile type string default ' + self.modelfile)
        print('usiok')

    def setoption(self, option):
        if option[1] == 'modelfile':
            self.modelfile = option[3]

    def isready(self): #GUI software side: Send is ready command before the game starts. USI side: Initializes and returns ready ok.
        if self.model is None:
            self.model = ValueNetwork()
            if gpu_en == 1:
                self.model.to_gpu()
        serializers.load_npz(self.modelfile, self.model)
        print('readyok')

    def go(self):
        if self.board.is_game_over():
            print('bestmove resign')
            return

        #About all legal hands
        #The location of legal hand filtering has come before the policy network.
        #My goal is? Reduced memory usage?
        #Note that the meaning of variables may be different from that of the policy network because of that.(leagal_logits and logits)
        legal_moves = []
        features = []
        for move in self.board.legal_moves:
            legal_moves.append(move)

            self.board.push(move) #1 hand

            features.append(make_input_features_from_board(self.board))
            # make_input_features_from_Board output: Where the first piece is, the first piece, the second piece, the second piece
            # [(9x9 matrix),
            # (9x9 matrix),... is (18 + 4 + 4 + 4 + 4 + 2 + 2),
            # (9x9 matrix),
            # (9x9 matrix),... is (18 + 4 + 4 + 4 + 4 + 2 + 2)]
            #This array is added to features as many as legal moves.[[This array],[This array],・ ・ ・,[This array]]。
            #The number of elements is the number of legal hands. For example, if you are a beginner, the number of elements is 30.

            self.board.pop() #1 rework

        if gpu_en == 1:
            x = Variable(cuda.to_gpu(np.array(features, dtype=np.float32)))
        elif gpu_en == 0:
            x = np.array(features, dtype=np.float32)

        #Invert the sign to get the winning percentage on your turn side
        with chainer.no_backprop_mode():
            y = -self.model(x)

            if gpu_en == 1:
                logits = cuda.to_cpu(y.data).reshape(-1) # reshape(-1)Make a one-dimensional array with
                probabilities = cuda.to_cpu(F.sigmoid(y).data).reshape(-1)
            elif gpu_en == 0:
                logits = y.data.reshape(-1) #By the way, y.The number of elements of data is the number of legal moves. For example, 30 for the first move.
                probabilities = F.sigmoid(y).data.reshape(-1)
                # y.An example of data
                # [[-0.04460792]
                #  [ 0.02167853]
                #  [ 0.04610606]
                #・ ・ ・
                #  [-0.09904062]]
                #
                # y.data.reshape(-1)
                #  [-0.04460792  0.02167853  0.04610606 -0.10492548 -0.22675163 -0.23193529
                #   -0.06671577  0.02509898 -0.02109829 -0.05519588 -0.05578787 -0.03609923
                #   -0.11021192 -0.10877373 -0.04065045 -0.01540023 -0.0336022  -0.03805592
                #    -0.03325626 -0.02194545 -0.08399387 -0.13204134 -0.2106831  -0.24970257
                #    -0.18735377 -0.08184412 -0.15573277 -0.00548664 -0.0353202  -0.09904062]
                #The number of elements is the number of legal hands. The above is an example of the first move, and since there are 30 legal moves, there are 30 elements.

            for i, move in enumerate(legal_moves):
            #enumerate returns index and value
            #Make to get index in policy network_output_I was labeling.
            #The principle meaning is easier to understand. However, there are many descriptions.
            #The value network uses enumerate to get the index.
            #The principle meaning is difficult to understand, but the description is easy. It seems that the meaning of what you are doing is the same.
                #Show probability
                print('info string {:5} : {:.5f}'.format(move.usi(), probabilities[i]))
                print(y.data)
                print(y.data.reshape(-1))

        if algorithm == 'greedy':
            #(1) Select the move with the highest probability (greedy strategy) Simply return the element with the highest probability.
            selected_index = greedy(logits)
        elif algorithm =='boltzmann':
            #(2) Choose a hand according to the probability (Softmax strategy) Randomly return elements with a high probability.
            selected_index = boltzmann(np.array(logits, dtype=np.float32), 0.5)

        bestmove = legal_moves[selected_index]

        print('bestmove', bestmove.usi())

Game

AI that employs only the value network and performs only one-hand search. It's too weak.

Game video https://youtu.be/W3ZqlcDg_yE

Final map image.png

Recommended Posts

Deep Learning with Shogi AI on Mac and Google Colab Chapter 11
Deep Learning with Shogi AI on Mac and Google Colab Chapter 8
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7
Deep Learning with Shogi AI on Mac and Google Colab Chapter 10 6-9
Deep Learning with Shogi AI on Mac and Google Colab Chapter 10
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 5-7
Deep Learning with Shogi AI on Mac and Google Colab Chapter 9
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 1-2
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3 ~ 5
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 9
Deep Learning with Shogi AI on Mac and Google Colab Chapter 8 5-9
Deep Learning with Shogi AI on Mac and Google Colab Chapter 8 1-4
Deep Learning with Shogi AI on Mac and Google Colab Chapter 12 3
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 8
Deep Learning with Shogi AI on Mac and Google Colab Chapter 7 1-4
Deep Learning with Shogi AI on Mac and Google Colab
Deep Learning with Shogi AI on Mac and Google Colab Chapters 1-6
Learn with Shogi AI Deep Learning on Mac and Google Colab Use Google Colab
Deep Learning on Mac and Google Colab Words Learned with Shogi AI
Machine learning with Pytorch on Google Colab
About learning with google colab
Steps to quickly create a deep learning environment on Mac with TensorFlow and OpenCV
Play with Turtle on Google Colab
Use MeCab and neologd with Google Colab
"Learning word2vec" and "Visualization with Tensorboard" on Colaboratory
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Install selenium on Mac and try it with python
Deep learning image analysis starting with Kaggle and Keras
[AI] Deep Metric Learning
Extract music features with Deep Learning and predict tags
"Deep Learning from scratch" Self-study memo (No. 14) Run the program in Chapter 4 on Google Colaboratory
[Google Colab] How to interrupt learning and then resume it
Recognize your boss and hide the screen with Deep Learning
An error that stumbled upon learning YOLO on Google Colab
Machine learning environment settings based on Python 3 on Mac (coexistence with Python 2)
HIKAKIN and Max Murai with live game video and deep learning
Easy deep learning web app with NNC and Python + Flask
Try deep learning with TensorFlow
Deep Kernel Learning with Pyro
Plotly Dash on Google Colab
Try Deep Learning with FPGA
Catalina on Mac and pyenv
Generate Pokemon with Deep Learning
Create AtCoder Contest appointments on Google Calendar with Python and GAS
Build a Python environment on your Mac with Anaconda and PyCharm
Error and solution when installing python3 with homebrew on mac (catalina 10.15)
How to run Jupyter and Spark on Mac with minimal settings
The strongest way to use MeCab and CaboCha with Google Colab
[Reading Notes] Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow Chapter 1
Install lp_solve on Mac OS X and call it with python.
Try Deep Learning with FPGA-Select Cucumbers
Cat breed identification with deep learning
Deep Learning / Deep Learning from Zero Chapter 3 Memo
tensor flow with anaconda on mac
MQTT on Raspberry Pi and Mac
Make ASCII art with deep learning
Deep Learning / Deep Learning from Zero 2 Chapter 5 Memo