Gomoku in Jupyter Notebook

In the Jupyter Notebook, the frames are displayed and you can arrange them in five rows. Gomoku AI is realized by searching for Monte Carlo trees and is designed to play against humans. jupyter notebookの中で五目並べ

The size of the board is 5 x 5, and if four pieces are aligned vertically, horizontally, and diagonally, the winner wins. ..

Download program files

gomoku.ipynb Confirmed operating environment: Microsoft Azure Notebooks For how to execute, see the first cell after reading the above file with jupyter notebook.

Monte Carlo tree search

Monte Carlo tree search is a method of simulating the goodness of each board while growing a tree whose board is a node (leaf) and the hand that can be hit from that board is an edge (branch).

Roughly speaking, it is a method of pre-reading as much as possible on the board that is likely to be a better hand, but also examining the board that has not been tried much. Whether a certain board is good or bad for AI is decided by randomly taking steps from that board and which one wins. This method of randomly taking action to know the ending is called playout.

In more detail, do the following:

Suppose there is a board (1).
Create a search tree (2) to (5) one step ahead of (1). At this time, do not create the same board.
Select one from (2) to (5) according to the following criteria. If it is a branch node like (2) and (3), it repeats selecting one of its child nodes.
If the number of selections of the selected leaf node is n or more, create a search tree further from there. If not, play out randomly to determine the winner.
When playout is performed, the result is propagated to the upper node.
Return to processing 3 for the specified number of times.
After the specified number of times, select the move with the most selections from the child nodes (2) to (5) on the board (1) and actually hit it.

The AI performs the above processing in order to select a move. In particular, run 3 to 5 as many times as time allows. It is said that the greater the number of times, the stronger the AI. In the case of this Gomoku AI, it is empirically known that a strong AI can be created by repeating 3 to 5 1000 times. In addition, the move that AI actually makes after repeated simulations is the move to the board that has been simulated most frequently, as in reference [^ 1].

Criteria for selecting one node from child nodes: The following formula selects the node with the maximum value.

Q(s,a)+C_p\frac{\sqrt{2\log{n_s}}}{n_{s,a}}

$ Q (s, a) $: Evaluation value corresponding to the winning percentage when you make a move on a certain board s $ C_p $: Coefficient that balances the winning percentage and the number of selections $ n_ {s} $: Number of times a board s is selected $ n_ {s, a} $: The number of times a is struck from a certain board s

Overview of processing in Jupyter Notebook

The screen is written in JavaScript and displayed in Jupyter Notebook by HTML of IPython.

from IPython.display import HTML
html = '''
<div id="dialog" title="Gomoku">
    <div id="canvas-wrapper">
        <canvas id="dlg-canvas"></canvas>
    </div>
    <button id="start">Start</button>
    <div id="status"></div>
</div>
<script>
    $("#dialog").dialog();
</script>
'''
HTML(html)

Create a function to determine which one is winning (or whether the winner has not been decided yet) on a certain board.

The state of the board is represented as an array, and it is determined whether the frames are arranged vertically, horizontally, and diagonally by manipulating the index of the array. Details will be explained in another article that has not been posted yet. 盤面のインデックス

#Winner judgment
def judge(cells,length=params["length"],n_pieces=params["n_pieces"]):
    # ....
    return 0

Create a class to judge the arrangement that will be the same board

The same board is a board that has the same frame arrangement when a certain board is viewed from the opposite side. When the AI searches for a move, the arrangement that makes the same board is the same as the result when moving forward, so for efficiency, the same board that appears second is that Do not process the destination. The details of this process will be described in another article that has not been posted yet.

#Create an index array of all the same board cases in advance,
#A class that checks if the board is the same
class cell_comparator:
    # ....
    pass

Separate the Monte Carlo tree search process and the Gomoku game part.

Details will be given in another article that has not been posted yet.

The part that depends on the type of game is implemented by class Game. By swapping this class Game, you can implement Monte Carlo tree search for different games.

#The part that changes depending on the type of game
class Game():
    #Make a victory decision. The player number is 1-Set to 1 and output the winner's number. Returns 0 if not in a winning state.
    def judge(self,state):
        return 0

    #Compare whether the two boards are the same. Returns True if they are the same, False if they are different.
    def compare(self,state0,state1):
        return False

    #Get the valid move from the board of the argument. Hands that have the same board surface do not need to be considered here.
    #MTCS is a game.Use compare to exclude moves that have the same board from valid moves and proceed with the process.
    #playout shuffles this effective hand and advances it randomly.
    def actions(self,state):
        return []

    #Take a step from the current board
    # state:Board, player:Player number to take action, action:what will you do(In the case of Gomoku, the place to hit)
    def move(self,state,action):
        return state2
    
    # playout
    def playout(self,state):
        return 0.0 #draw

    #The state of the board is created from the character expression.
    def str2state(self,state_str):
		return {}
game = Game()

It is the code of the Monte Carlo tree search part. Screen processing JavaScript calls python's next_ai method. This process performs a Monte Carlo tree search and returns the next move. The number of simulations is determined by n, which has a default argument of 1000 for next_ai. Increasing this number will increase the number of simulations and strengthen the AI.

#
#Monte Carlo tree search
#Use the victory check and board identity check in the previous cell.
#

#The board selected this number of times creates a child node as an Expand process and grows a tree.
exp_limit = 8
#Evaluation value of selection q in the selection phase+c(n) (q:Evaluation value of the board, c(n)Evaluation value of the number of selections)of
#Coefficient to be multiplied by the number of selections
cp = 1.0

#In order to reuse the State object, prepare an array in a global variable and save everything there.
#The index in this array is the state object's css_It is held by index.
css = []


# class a state representing an condition of all stones on the borad
class State():
    # ....
    pass

# next ai interface
def next_ai(state_str,n=1000):
    #Current board condition(cells), Next turn(-1), Because the first State expands unconditionally
    # expand=Create a State object with True.
    cs = State(game.str2state(state_str),expand=True)
    cs.set_css_index(0)
    
    #Add the first state to the state list of global variables.
    global css
    css = [cs]

    #Select is executed the specified number of times. In select,
    # evaluate, expand, (Implicitly)You are running a backup.
    for epoch in range(n):
        cs.select()
        pass
    return cs.solution()

AI strength

Empirically, AI with 1000 simulations is stronger than humans. If the first move is AI, the second move will often lose to AI if the first move is wrong. However, the AI of 1000 simulations sometimes makes weak moves, so humans may win at that time.

AI with 2000 simulations makes almost no mistakes. Therefore, if the first move is AI, even if there is a draw, AI will almost never lose.

What is this AI doing?

Anyway, I'm randomly picking the best ones. All that AI knows is (1) the board with four pieces in a row is a victory for AI or humans, and (2) new pieces cannot be placed on the squares where the pieces have already been placed. People think of it as a strategy for the game, for example, in this game, if you make three consecutive pieces with spaces at both ends, the opponent will prevent one end but hit the remaining end in the next round. AI doesn't know anything about the strategy of deciding the game. As a result, AI behaves as if it knows those human knowledge without implementing it.

Monte Carlo Tree Search Possibilities and Limits

Unless it is possible to fully search the search space that represents all the states of the board, some method will be used to search for hands in a limited manner. At that time, the difference in determining which is the search target and which is not to be searched is the difference in the search method. For example, with a method such as αβ search, it is not possible to know how much time is required to search without actually searching. In other words, if the search is stopped in the middle, it is often not the optimal solution at that stage. On the other hand, the Monte Carlo tree search basically searches from the place where the possibility of the optimum solution is high. Therefore, even if the search is stopped in the middle, it can be considered that the optimum solution is obtained at that stage. The search performance proportional to this time is a very good characteristic when it is practically used as a game AI or the like.

On the other hand, although it is natural, the Monte Carlo tree search cannot be effectively used for the problem that the ending of the game cannot be obtained (difficult to obtain) from a certain board called playout. In this playout part, a technique like deep learning such as learning the ending from a certain board may be taken [^ 1].

[PYTHON] Gomoku AI by Monte Carlo tree search