[PYTHON] Introducing the Min-Max strategy to Othello's AI

Introduction

Previously, I wrote an article Let's make Othello with wxPython. At that time, I didn't implement a decent computer AI, so I'll try to make it a little more decent. This time, we will introduce a game strategy called Min-Max strategy that we have just studied. I would like to implement it and play against each other.

things to do

Modify the wxPython Othello created in Previous Article.

  1. Set the game mode of Computer vs Computer to two types of AI battle
  2. Play Computer vs Computer a specified number of times and be able to display the winning percentage
  3. Create AI without look-ahead
  4. 1 Create a look-ahead AI
  5. Create an AI to carry out the Min-Max strategy

All implemented source code can be found on the GitHub "alpha-beta" branch, so please get the latest version.

Undercard

Make Comp vs Comp a match between two types of AI

I prepared a switch called comp_ai and toggled it on the computer turn to switch the AI. comp_ai tries to randomly select [1, -1] as the initial value. In the case of vs Man, it is fixed at comp_ai = 0.

def doComputer(self, my_color):
    self.comp_ai *= -1
    ...

def decideComputerNext(self, pos_list, gain_list): 
    ...
    # Insert a computer's AI here 
    if self.comp_ai >= 0:    # comp_ai == 0 => vs Man mode 
        # Computer AI - A 
    else: 
        # Computer AI - B

    return next_pos 

Play Comp vs Comp battles a specified number of times and be able to display the winning percentage

Check the GUI for this. It is added at the bottom right. Select Computer vs Computer, enter the number of loops, and press "Comp vs Comp Loop" to perform Comp vs Comp a specified number of times and display the number of wins for each AI. The numbers at the bottom are the number of AI-A wins, the number of AI-B wins, and the number of Draws.

As an aside, [GUI collapses on Windows 10](http://qiita.com/kanlkan/items/5e6f2e63de406f46b3b1#%E5%8B%95%E4%BD%9C%E7%A2%BA%E8%AA%8D % E7% 92% B0% E5% A2% 83) As a countermeasure, I recreated the GUI with 1 panel and 1 component without using Sizer. Layout is difficult without Sizer. .. .. Why doesn't Sizer work properly on Windows 10? Please let me know if you know.

Create AI without look-ahead

No look-ahead, that is, randomly decide where to put the stone from all the places.

def computerAi_Random(self, pos_list, gain_list): 
    index = random.randint(0, len(pos_list)-1) 
    return pos_list[index] 

1 Create a look-ahead AI

1 Look ahead, that is, place the stone in the place where the number of stones you can get is the largest. (In Othello's joseki, it seems better to put it in the position where the stones that can be taken are the least in the early stage ... Well, I don't care.)

def computerAi_1stGainMax(self, pos_list, gain_list): 
    index_list = [] 
    max_gain = max(gain_list) 
    for i, val in enumerate(gain_list): 
        if max_gain == val: 
            index_list.append(i) 
 
    tgt = random.randint(0, len(index_list)-1) 
    return pos_list[index_list[tgt]] 

Main story

Min-Max strategy (Min-Max method)

The Min-Max strategy is a method of examining all the branches up to a few moves, assuming that the state transition of the game tree is advanced by taking the best possible action with each other. As an example, let's try to find the best move with black as yourself. First move = black turn, think up to 3 moves. In this case, the place where you can put it in the black turn is the light blue circle, and the number of stones that can be turned over is the number in the circle. init-.png Here, if Black chooses a move of σ00, the place to put it in the next white turn and the number of stones that can be taken are as follows. sigma00-.png Here, if White chooses a move with σ10, the place to put it in the next black turn and the number of stones that can be taken are as follows. sigma10-.png If you write a game tree up to 2 minions in this way, it will be as follows. The evaluation value of the black turn of 2 minions is the maximum number of stones that can be taken (maximum value of σ20 to σ24). The lower number in the circle in the "Black 2" column is the evaluation value. game-tree.png Next, the evaluation value is propagated to the column of "white 1", which is the white turn of one move. White also chooses his best move, so he should choose the move that minimizes the rating in the "Black 2" column. In other words, the minimum evaluation value of "Black 2" is propagated to the "White 1" column. game-tree2.png Next, the evaluation value is propagated from "white 1" to "black 1". Contrary to the previous time, Black should choose his hand so that his evaluation value is maximized, so he propagates the maximum evaluation value of "White 1". Thinking in this way, you can decide the first move of black. It is called the Min-Max strategy because the hand is determined by repeating the minimum and maximum evaluation values. In this example, the evaluation value does not change with any hand from σ00 to σ03, so it can be placed in any way. This seems to be intuitively correct because the place where you can put it with your first hand is point-symmetrical. It was an example of making it a little more interesting. Considering the latter half of the game, the size of the tree becomes very large, so I took the first move as an example.

Implementation

I will implement it. The reason why the max_depth of minMax is 2 even though it is 3 pre-reading is that the final evaluation value is propagated in computeAi_MinMax_3, so enter the number minus 1 in the argument.

    def computerAi_MinMax_3(self, pos_list, gain_list):
        value = []
        update_pos_list = []
        
        self.log_on = False 
        value = self.minMax(2, 2, pos_list, gain_list)
        for i, pos in enumerate(pos_list):
            if max(value) == value[i]:
                update_pos_list.append(pos)

        self.log_on = True
        tgt = random.randint(0, len(update_pos_list)-1)
        return update_pos_list[tgt]

    def minMax(self, depth, max_depth, pos_list, gain_list):  # depth > 0
        value = []
        next_value = []
        next_pos_list = []
        next_gain_list = []
        self.backUpAllState(self.state_storage_list[depth])
        for pos in pos_list:
            ret =  self.putComputerStone(pos, False)
            next_pos_list, next_gain_list = self.scanPuttableCell()
            if (depth > 1):
                next_value = self.minMax(depth-1, max_depth, next_pos_list, next_gain_list)
                if len(next_value) == 0:
                    value.append(0)
                elif (max_depth - depth) % 2 == 0:
                    value.append(min(next_value))
                else:
                    value.append(max(next_value))
            else:
                if len(next_gain_list) == 0:
                    value.append(0)
                elif (max_depth - depth) % 2 == 0:
                    value.append(min(next_gain_list))
                else:
                    value.append(max(next_gain_list))

            self.restoreAllState(self.state_storage_list[depth])

        return value

There was a bit of redundancy left, but I was able to implement it neatly. However, it took a long time to get here. .. ..

Compete between AIs

Let's play 100 battles between AIs. The first attack and the second attack are set randomly.

  1. Min-Max_3 look-ahead vs no look-ahead
  2. Min-Max_3 look-ahead vs 1 look-ahead

The result is as follows.

  1. Min-Max 70 wins, 25 losses, 5 draws
  2. Min-Max 48 wins, 48 losses, 4 draws

Ah, the Min-Max strategy isn't very strong. I think the implementation is correct. Well, is it salvation to win without look-ahead? By the way, this Min-Max strategy is very time consuming. Please note that if you play Computer vs Computer with your own hands, it will take about 10 minutes to play 100 battles, depending on the performance of your PC. In the future, I will try to implement the αβ strategy to speed up by narrowing the search range of the Min-Max strategy.

Postscript

One possible reason why the Min-Max strategy didn't work was that every stone was valued as +1. Therefore, weight each place. I understand that the evaluation value seems to be high somehow, but it was a bit of a stumbling block, so I referred to this site. http://uguisu.skr.jp/othello/5-1.html

So, weight it as follows and modify it so that you can get gain. weight.png

gGain = [( 30, -12,  0, -1, -1,  0, -12,  30), \
         (-12, -15, -3, -3, -3, -3, -15, -12), \
         (  0,  -3,  0, -1, -1,  0,  -3,   0), \
         ( -1,  -3, -1, -1, -1, -1,  -3,  -1), \
         ( -1,  -3, -1, -1, -1, -1,  -3,  -1), \
         (  0,  -3,  0, -1, -1,  0,  -3,   0), \
         (-12, -15, -3, -3, -3, -3, -15, -12), \
         ( 30, -12,  0, -1, -1,  0, -12,  30)]
...

    def vecScan(self, pos, reverse_on):
        rev_list = []
        temp_list = []
        gain = 0
        is_hit = 0
        if reverse_on == 0 and self.getCellState(pos,(0,0)) != "green":
            return 0, gain

        if self.now_color == "black":
            rev_color = "white"
        else:
            rev_color = "black"
            
        for v in gVec:
            temp_list = []
            for i in range(1, 8):
                if self.getCellState(pos,(v[0]*i,v[1]*i)) == rev_color:
                    temp_list.append(self.movePos(pos,(v[0]*i, v[1]*i)))
                    if self.getCellState(pos,(v[0]*i+v[0], v[1]*i+v[1])) == self.now_color:
                        is_hit = 1
                        for j in temp_list:
                            rev_list.append(j)
                        break
                else:
                    break
        
        if reverse_on == True:
            if self.log_on == True:
                self.log_textctrl.AppendText("put:" + str(pos) + ", "  + str(rev_list) + " to " + str(self.now_color) + "\n")
            for rev_pos in rev_list:
                self.setCellState(rev_pos, (0,0), self.now_color)
                if self.now_color == "black":
                    self.player_score[0] += 1
                    self.player_score[1] -= 1
                else:
                    self.player_score[1] += 1
                    self.player_score[0] -= 1
                self.updateScoreLabel()
        
        gain = self.calcGain(pos, rev_list)
        return is_hit, gain

    def calcGain(self, pos, rev_list):
        ret_gain = 0
        ret_gain += gGain[pos[0]][pos[1]]

        for rev_pos in rev_list:
            ret_gain += gGain[rev_pos[0]][rev_pos[1]]

        return ret_gain

Let's play again.

  1. Min-Max_3 look-ahead vs no look-ahead
  2. Min-Max_3 look-ahead vs 1 look-ahead
  3. 1 look-ahead vs no look-ahead

The result is as follows.

  1. Min-Max 38 wins, 55 losses, 7 draws
  2. Min-Max 19 wins, 80 losses, 1 draw
  3. 1 look-ahead 83 wins, 16 losses, 1 draw

Uh ... Min-Max's winning percentage has dropped. 1 Look-ahead is quite strong. There is room for reconsideration. It was poor.

Recommended Posts

Introducing the Min-Max strategy to Othello's AI
It's okay to stumble on Titanic! Introducing the Kaggle strategy for super beginners
Introducing Python 2.7 to CentOS 6.6
The road to Pythonista
The road to Djangoist
The story of introducing jedi (python auto-completion package) to emacs
Building the strongest environment after introducing Manjaro to Dell XPS 15