[PYTHON] Try to make a blackjack strategy by reinforcement learning (② Register the environment in gym)

Introduction

I tried to make a strategy for blackjack while studying Python and reinforcement learning. There is a probability-based strategy called a basic strategy, but I will try to catch up with it.

I will proceed like this

  1. Blackjack implementation
  2. Register in the OpenAI gym environment ← This time here
  3. Learn blackjack strategy with reinforcement learning

What is OpenAI gym?

It is a platform used as a research environment for reinforcement learning. Environments (games) such as CartPole and maze are prepared, and you can easily try reinforcement learning. The OpenAI Gym environment has a common interface that receives actions from agents and returns the next state and reward as a result. Installation can be done easily as follows, but please refer to other pages for the detailed method. In the following, it is assumed that the installation has been completed.

pip install gym

This time, I will register my own blackjack in this OpenAI Gym environment so that I can perform reinforcement learning.

Review of reinforcement learning

First, let's take a quick look at reinforcement learning. The "state" is observed from the "environment", and the "agent" takes "action" on it. The "environment" feeds back the updated "state" and "reward" to the "agent". The purpose of reinforcement learning is to acquire a "action" method (= policy) that maximizes the sum of the "rewards" that will be obtained in the future.

Applying Reinforcement Learning Elements to Blackjack

In this blackjack, we consider reinforcement learning as follows.

--Environment: Blackjack --Agent: Player --Status: Player card, Dealer card, etc. --Action: Player selection. Hit, Stand, etc. --Reward: Chips obtained in the game

image.png

Procedure for registering the environment in OpenAI Gym

Follow the steps below to register your own environment in OpenAI Gym.

  1. Create a blackjack environment class "BlackJackEnv" that inherits gym.Env of OpenAI Gym
  2. Register the environment using the gym.envs.registration.register function so that it can be called with the ID BlackJack-v0.

Development environment

file organization

The file structure is as follows. Please note that there are two files named \ _ \ _ init \ _ \ _. Py.

└─ myenv
    ├─ __init__.py  --->Call BlacJackEnv
    └─env
       ├─ __init__.py  --->Indicates where the BlackJack Env is located
       ├─ blackjack.py  --->BlackJack game itself
       └─ blackjack_env.py  --->OpenAI Gym gym.Create a BlackJackEnv class that inherits Env

Then, follow the procedure to register the environment.

Create a blackjack environment class "BlackJackEnv" that inherits gym.Env of OpenAI Gym

myenv/env/blackjack.py Leave the Blackjack code created last time as it is. Import it with blackjack_env.py below and use it.

myenv/env/blackjack_env.py Create the BlackJack game environment "BlackJackEnv" class that you want to register in OpenAI Gym. Inherit gym.Env and implement the following 3 properties and 5 methods.

Property

--action_space: Indicates what action the player (agent) can select. --observation_space: Information on the game environment that the player (agent) can observe --reward_range: Range from minimum to maximum reward

Method

--reset: A method for resetting the environment. --step: A method that executes an action in the environment and returns the result. --render: A method that visualizes the environment. --close: A method for closing the environment. It is used at the end of learning. --Seed: A method to fix a random seed.

action_space property

It shows that you can take four actions: Stand, Hit, Double Down, and Surrender.

action_space


self.action_space = gym.spaces.Discrete(4)
observation_space property

Observe the total points of the Player's hand, the points of the Dealer's disclosed hand, the flag indicating the soft hand (A is included in the Player's hand), and the flag indicating whether the Player has been hit. Determine the maximum and minimum values for each.

observation_space


high = np.array([
            30,  # player max
            30,  # dealer max
            1,   # is_soft_hand
            1,   # hit flag true
        ])
        low = np.array([
            2,  # player min
            1,  # dealer min
            0,  # is_soft_hand false
            0,  # hit flag false
        ])
        self.observation_space = gym.spaces.Box(low=low, high=high)
reward_range property

Determine the range of rewards. Here, it is decided to include the minimum and maximum values of chips that can be obtained.

reward_range


        self.reward_range = [-10000, 10000]
reset method

Initialize self.done, initialize Player and Dealer's hands with self.game.reset_game (), bet chips (Bet), and distribute cards (Deal). As mentioned in the step method, self.done is a Boolean value that indicates whether or not there is a win or loss. Observe and return 4 states with self.observe (). However, this time, we decided to train the player assuming that the chips possessed by the player will not decrease.

reset()


    def reset(self):
        #Initializes the state and returns the initial observations
        #Initialize various variables
        self.done = False

        self.game.reset_game()
        self.game.bet(bet=100)
        self.game.player.chip.balance = 1000  #The amount of money you have will never be zero while you are studying
        self.game.deal()
        # self.bet_done = True

        return self.observe()
step method

The Player takes either Stand, Hit, Double down, or Surrender with respect to the environment. If the player's turn is over, the chips will be settled. Finally, the following four pieces of information are returned.

--observation: The state of the observed environment. --Reward: The amount of reward earned by the action. --done: A Boolean value that indicates whether the environment should be reset again. In BlackJack, a Boolean value that indicates whether or not there is a win or loss. --info: A dictionary that can set useful information for debugging.

Also, in this learning environment, if you double down or Surrender after hitting, you will be penalized for violating the rules.

step()


    def step(self, action):
        #Execute action and return the result
        #Describe the process to advance one step. The return value is observation, reward,done (has the game finished), info(Dictionary of additional information)

        if action == 0:
            action_str = 's'  # Stand
        elif action == 1:
            action_str = 'h'  # Hit
        elif action == 2:
            action_str = 'd'  # Double down
        elif action == 3:
            action_str = 'r'  # Surrender
        else:
            print(action)
            print("Undefined Action")
            print(self.observe())

        hit_flag_before_step = self.game.player.hit_flag
        self.game.player_step(action=action_str)

        if self.game.player.done:
            #At the end of the player's turn
            self.game.dealer_turn()
            self.game.judge()
            reward = self.get_reward()
            self.game.check_deck()
            print(str(self.game.judgment) + " : " + str(reward))


        elif action >= 2 and hit_flag_before_step is True:
            reward = -1e3  #Give a penalty if you violate the rules

        else:
            #When continuing a player's turn
            reward = 0

        observation = self.observe()
        self.done = self.is_done()
        return observation, reward, self.done, {}

This time, the render, close, and seed methods are not used.

The whole code of blackjack_env.py looks like this:

myenv/env/blackjack_env.py


import gym
import gym.spaces
import numpy as np

from myenv.env.blackjack import Game


class BlackJackEnv(gym.Env):
    metadata = {'render.mode': ['human', 'ansi']}

    def __init__(self):
        super().__init__()

        self.game = Game()
        self.game.start()

        # action_space, observation_space, reward_Set range
        self.action_space = gym.spaces.Discrete(4)  # hit, stand, double down, surrender

        high = np.array([
            30,  # player max
            30,  # dealer max
            1,   # is_soft_hand
            1,   # hit flag true
        ])
        low = np.array([
            2,  # player min
            1,  # dealer min
            0,  # is_soft_hand false
            0,  # hit flag false
        ])
        self.observation_space = gym.spaces.Box(low=low, high=high)
        self.reward_range = [-10000, 10000]  #List of minimum and maximum rewards

        self.done = False
        self.reset()

    def reset(self):
        #Initializes the state and returns the initial observations
        #Initialize various variables
        self.done = False

        self.game.reset_game()
        self.game.bet(bet=100)
        self.game.player.chip.balance = 1000  #The amount of money you have will never be zero while you are studying
        self.game.deal()
        # self.bet_done = True

        return self.observe()

    def step(self, action):
        #Execute action and return the result
        #Describe the process to advance one step. The return value is observation, reward,done (has the game finished), info(Dictionary of additional information)

        if action == 0:
            action_str = 's'  # Stand
        elif action == 1:
            action_str = 'h'  # Hit
        elif action == 2:
            action_str = 'd'  # Double down
        elif action == 3:
            action_str = 'r'  # Surrender
        else:
            print(action)
            print("Undefined Action")
            print(self.observe())

        hit_flag_before_step = self.game.player.hit_flag
        self.game.player_step(action=action_str)

        if self.game.player.done:
            #At the end of the player's turn
            self.game.dealer_turn()
            self.game.judge()
            reward = self.get_reward()
            self.game.check_deck()
            print(str(self.game.judgment) + " : " + str(reward))


        elif action >= 2 and hit_flag_before_step is True:
            reward = -1e3  #Give a penalty if you violate the rules

        else:
            #When continuing a player's turn
            reward = 0

        observation = self.observe()
        self.done = self.is_done()
        return observation, reward, self.done, {}

    def render(self, mode='human', close=False):
        #Visualize the environment
        #In the case of human, it is output to the console. Returns StringIO for ansi
        pass

    def close(self):
        #Close the environment and perform post-processing
        pass

    def seed(self, seed=None):
        #Random seeds fixed
        pass

    def get_reward(self):
        #Return reward
        reward = self.game.pay_chip() - self.game.player.chip.bet
        return reward

    def is_done(self):
        if self.game.player.done:
            return True
        else:
            return False

    def observe(self):
        if self.game.player.done:
            observation = tuple([
                self.game.player.hand.calc_final_point(),
                self.game.dealer.hand.calc_final_point(),  #Dealer card total score
                int(self.game.player.hand.is_soft_hand),
                int(self.game.player.hit_flag)])
        else:
            observation = tuple([
                self.game.player.hand.calc_final_point(),
                self.game.dealer.hand.hand[0].point,  #Dealer up card only
                int(self.game.player.hand.is_soft_hand),
                int(self.game.player.hit_flag)])

        return observation

Register the environment using the gym.envs.registration.register function so that it can be called with the ID BlackJack-v0.

myenv/__init__.py Register BlackJackEnv with gym using the gym.envs.registration.register function. Here we declare that we will call the class BlackJackEnv under the env directory under the myenv directory with the ID BlackJack-v0.

myenv/__init__.py


from gym.envs.registration import register

register(
    id='BlackJack-v0',
    entry_point='myenv.env:BlackJackEnv',
)

myenv/env/__init__.py Declare that the BlakcJackEnv class is in blackjack_env.py under the env directory under the myenv directory.

myenv/env/__init__.py


from myenv.env.blackjack_env import BlackJackEnv

To make reinforcement learning

In the reinforcement learning code, you can use the environment by setting ʻenv = gym.make ('BlackJack-v0')`.

This time, the registration of the environment is the main, so I will omit it, but the next article will create this.

At the end

I registered my own blackjack game in the OpenAI Gym environment. I realized that I had to think carefully about what to do, what to observe as a state, what to reward, and where to go from one step to the environment I made. .. At first, the length of one step was set to be ridiculously long. .. ..

Next, I would like to use this environment to learn blackjack strategies.

Sites / books that I referred to

-Create your own environment with OpenAI Gym -[[OpenAI Gym] Create AI to defeat RPG bosses by reinforcement learning ①](http://mokichi-blog.com/2019/03/03/%e3%80%90tensorflow-gym-lstm%e3%80 % 91% e5% bc% b7% e5% 8c% 96% e5% ad% a6% e7% bf% 92% e3% 81% a7rpg% e3% 81% ae% e3% 83% 9c% e3% 82% b9 % e3% 82% 92% e5% 80% 92% e3% 81% 99 /) -[[OpenAI Gym] Create AI to defeat RPG bosses by reinforcement learning ②](http://mokichi-blog.com/2019/03/03/%e3%80%90openai-gym%e3%80%91 % e5% bc% b7% e5% 8c% 96% e5% ad% a6% e7% bf% 92% e3% 81% a7rpg% e3% 81% ae% e3% 83% 9c% e3% 82% b9% e3 % 82% 92% e5% 80% 92% e3% 81% 99 /) -[[OpenAI Gym] Create AI to defeat RPG bosses by reinforcement learning ③](http://mokichi-blog.com/2019/03/05/%E3%80%90openai-gym%E3%80%91 % E5% BC% B7% E5% 8C% 96% E5% AD% A6% E7% BF% 92% E3% 81% A7rpg% E3% 81% AE% E3% 83% 9C% E3% 82% B9% E3 % 82% 92% E5% 80% 92% E3% 81% 99ai% E3% 82% 92% E3% 81% A4% E3% 81% 8F% E3% 82% 8B% E2% 91% A2 /) -[Machine Learning Startup Series Enhanced Learning with Python](https://www.amazon.co.jp/%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92% E3% 82% B9% E3% 82% BF% E3% 83% BC% E3% 83% 88% E3% 82% A2% E3% 83% 83% E3% 83% 97% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82% BA-Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E5% BC% B7% E5% 8C% 96% E5% AD% A6% E7% BF% 92-% E6% 94% B9% E8% A8% 82% E7% AC% AC2% E7% 89% 88-% E5% 85% A5% E9% 96% 80% E3% 81% 8B% E3% 82% 89% E5% AE% 9F% E8% B7% B5% E3% 81% BE% E3% 81% A7-% E4% B9% 85% E4% BF% 9D / dp / 4065712519 / ref = asc_df_4065172519 /? tag = jpo-22 & linkCode = df0 & hvadid = 342520995287 & hvpos = & hvnetw = g & hvrand = 8228940498059590938 & hvpone = & hvptwo = & hvqmt = & hvdev = c & hvdvcmdl = & hvlocint = & hvps 69995157558 & hvpone = & hvptwo = & hvadid = 342520995287 & hvpos = & hvnetw = g & hvrand = 8228940498059590938 & hvqmt = & hvdev = c & hvdvcmdl = & hvlocint = & hvlocphy = 1009409 & hvtargid = pla-818473632453) -[Introduction to Deep Reinforcement Learning with Python Reinforcement Learning Beginning with Chainer and OpenAI Gym](https://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82%8B%E6 % B7% B1% E5% B1% A4% E5% BC% B7% E5% 8C% 96% E5% AD% A6% E7% BF% 92% E5% 85% A5% E9% 96% 80-Chainer% E3 % 81% A8OpenAI-Gym% E3% 81% A7% E3% 81% AF% E3% 81% 98% E3% 82% 81% E3% 82% 8B% E5% BC% B7% E5% 8C% 96% E5 % AD% A6% E7% BF% 92-% E7% 89% A7% E9% 87% 8E-% E6% B5% A9% E4% BA% 8C / dp / 4274222535 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords = Python% E3% 81% AB% E3% 82% 88% E3% 82% 8B% E6% B7% B1% E5% B1% A4% E5% BC% B7% E5% 8C% 96% E5% AD% A6% E7% BF% 92% E5% 85% A5% E9% 96% 80 & qid = 1584978761 & s = books & sr = 1-1)

Recommended Posts

Try to make a blackjack strategy by reinforcement learning (② Register the environment in gym)
Try to make a blackjack strategy by reinforcement learning (③ Reinforcement learning in your own OpenAI Gym environment)
Try to make a blackjack strategy by reinforcement learning ((1) Implementation of blackjack)
Try to make a Python module in C language
Machine learning beginners try to make a decision tree
I tried to create a reinforcement learning environment for Othello with Open AI gym
Let's make a leap in the manufacturing industry by utilizing the Web in addition to Python
A memorandum to register the library written in Hy in PyPI
Put the lists together in pandas to make a DataFrame
[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being
Try Q-learning in Dragon Quest-style battle [Introduction to Reinforcement Learning]
Perform morphological analysis in the machine learning environment launched by GCE
Cython to try in the shortest
Try to predict the triplet of boat race by ranking learning
I tried to make a system to automatically acquire the program guide → register it in the calendar in one day
Added a function to register desired shifts in the Django shift table
Script to register the IP address used by wercker in Security Group
[Python] Try to make a sort program by yourself. (Selection sort, insertion sort, bubble sort)
How to sort by specifying a column in the Python Numpy array.
Try to face the integration by parts
Try to make a kernel of Jupyter
Try to build a pipeline to store the result in Bigquery by hitting the Youtube API regularly using Cloud Composer
Even a new graduate engineer can do it! The trick to make a difference in synchronization by commenting out
From nothing on Ubuntu 18.04 to setting up a Deep Learning environment in Tensor
9 Steps to Become a Machine Learning Expert in the Shortest Time [Completely Free]
Try to draw a "weather map-like front" by machine learning based on weather data (3)
Try to draw a "weather map-like front" by machine learning based on weather data (1)
Try to draw a "weather map-like front" by machine learning based on weather data (4)
Build a python environment to learn the theory and implementation of deep learning
Try to draw a "weather map-like front" by machine learning based on weather data (2)
I tried to predict the change in snowfall for 2 years by machine learning
Instructions for connecting Google Colab. To the local runtime in a Windows environment
How to install the deep learning framework Tensorflow 1.0 in the Anaconda environment of Windows
Reinforcement learning 10 Try using a trained neural network.
Try to forecast power demand by machine learning
Use the latest pip in a virtualenv environment
Make a copy of the list in Python
Try to make a "cryptanalysis" cipher with Python
Set a fixed IP in the Linux environment
Explaining how to make LINE BOT in the world's easiest way (2) [Preparing a bot application in a local environment with Django in Python]
Try to make a dihedral group with Python
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 01 Memo "Let's Make a Dialogue Agent"
Try to make a command standby tool with python
Register a task in cron for the first time
I want to climb a mountain with reinforcement learning
The shortest route to get a cultural fish environment
Try loading the image in a separate thread (OpenCV-Python)
Try to decipher the login data stored in Firefox
Make a note of what you want to do in the future with Raspberry Pi
I made a class to get the analysis result by MeCab in ndarray with python
How to create a record by pasting a relation to the inheriting source Model in the Model inherited by Django
I want to see the graph in 3D! I can make such a dream come true.
A story that makes it easier to see Model debugging in the Django + SQLAlchemy environment