[PYTHON] [How to!] Learn and play Super Mario with Tensorflow !!

Introduction

Last time, there were many requests for learning Super Nintendo software, so this time I will share a program for learning NES that runs on Macbook etc.

I uploaded it to Git so that it can work in the experiment. Please Star if you want to download! https://github.com/tsunaki00/super_mario

17038850_1863772750540489_5777848282422220640_o.jpg

Experiment environment

I tested it in the following machine environment.

environment
PC Macbook PRO 2016 OSX
CPU i7
MEM 16GB
development language python

Supplement

Learning with Tensorflow is made appropriately, so please improve it.

Precautions about the environment

When I run it, I get the following error and I need to fix the library.

$ python3 start.py 
  Traceback (most recent call last):
    File "start.py", line 22, in <module>
      import gym_pull
    File "/usr/local/lib/python3.6/site-packages/gym_pull/__init__.py", line 41, in <module>
      import gym_pull.monitoring.monitor
    File "/usr/local/lib/python3.6/site-packages/gym_pull/monitoring/monitor.py", line 10, in <module>
      class Monitor(gym.monitoring.monitor.Monitor):
  AttributeError: module 'gym.monitoring' has no attribute 'monitor'

↓ Corrected below

$ vi /usr/local/lib/python3.6/site-packages/gym_pull/monitoring/monitor.py
   :
   :
  class Monitor(gym.monitoring.monitor.Monitor):
   ↓
  class Monitor(gym.monitoring.monitor_manager.MonitorManager):

A brief description of the program

The learning method is Reinforcement Learning. You will learn to evaluate the actions you have performed, which is a little different from supervised learning and unsupervised learning.

[Reference] Deep Q-Network

There was a description of DQN in the article below. History of DQN + Deep Q-Network written in Chainer

[Reference] Learn human operations.

We were able to make the fastest Mario as shown below by letting the evaluation learn the results of our own execution! Ratings are distance, time and score.

[[Artificial Intelligence] I tried to let AIVA play Mario![WORLD1-1]] https://www.youtube.com/watch?v=T4dO1GKPx4Y

Source code

――Action should be narrowed down properly. --The array is arranged so that it can be used for mini-batch processing, so change it as appropriate. ――It will be even better if you add randomness once in a while!

import tensorflow as tf
import gym
import gym_pull
import ppaquette_gym_super_mario
from gym.wrappers import Monitor
import random
import numpy as np
class Game :

  def __init__(self):
    self.episode_count = 10000;
    ## select stage
    self.env = gym.make('ppaquette/SuperMarioBros-1-1-Tiles-v0')

  def weight_variable(self, shape):
    initial = tf.truncated_normal(shape, stddev = 0.01)
    return tf.Variable(initial)

  
  def bias_variable(self, shape):
    initial = tf.constant(0.01, shape = shape)
    return tf.Variable(initial)

  def conv2d(self, x, W, stride):
    return tf.nn.conv2d(x, W, strides = [1, stride, stride, 1], padding = "SAME")

  def max_pool_2x2(self, x):
    return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = "SAME")

  
  def create_network(self, action_size):
    #Make it two layers
    W_conv1 = self.weight_variable([8, 8, 1, 16])
    b_conv1 = self.bias_variable([16])
    W_conv2 = self.weight_variable([4, 4, 16, 32])
    b_conv2 = self.bias_variable([32])
    W_conv3 = self.weight_variable([4, 4, 32, 64])
    b_conv3 = self.bias_variable([64])
    W_fc1 = self.weight_variable([512, action_size])
    b_fc1 = self.bias_variable([action_size])
    s = tf.placeholder("float", [None, 13, 16, 1])
    # hidden layers
    h_conv1 = tf.nn.relu(self.conv2d(s, W_conv1, 2) + b_conv1)
    h_conv2 = tf.nn.relu(self.conv2d(h_conv1, W_conv2, 2) + b_conv2)
    h_conv3 = tf.nn.relu(self.conv2d(h_conv2, W_conv3, 1) + b_conv3)
    h_conv3_flat = tf.reshape(h_conv3, [-1, 512])
    readout = tf.matmul(h_conv3_flat, W_fc1) + b_fc1
    return s, readout 


  def play_game(self) :
    action_list = []
    for i in range(64) :
      command = format(i, 'b')
      command = '{0:06d}'.format(int(command))
      actions = []
      for cmd in list(command) :
        actions.append(int(cmd))
      action_list.append(actions)
    sess = tf.InteractiveSession()
    s, readout = self.create_network(len(action_list))
    a = tf.placeholder("float", [None, len(action_list)])
    y = tf.placeholder("float", [None, 1])
    readout_action = tf.reduce_sum(tf.multiply(readout, a), reduction_indices = 1)
    cost = tf.reduce_mean(tf.square(y - readout_action))
    train_step = tf.train.AdamOptimizer(1e-6).minimize(cost)
    saver = tf.train.Saver()
    sess.run(tf.initialize_all_variables())
    checkpoint = tf.train.get_checkpoint_state("./saved_networks/checkpoints")
    if checkpoint and checkpoint.model_checkpoint_path:
      saver.restore(sess, checkpoint.model_checkpoint_path)
      print ("Successfully loaded:", checkpoint.model_checkpoint_path)
    else:
      print ("Could not find old network weights")
    for episode in range(self.episode_count):
      self.env.reset()
      total_score = 0
      distance = 0
      is_finished = False
      actions, rewards, images = [], [] ,[]
      while is_finished == False :
        #Get the dot at the top left of the screen(If you do it with an image, self.env.You can get it on the screen)
        screen = np.reshape(self.env.tiles, (13, 16, 1))
        if episode < 10 :
          action_index = random.randint(0, len(action_list) - 1)
        else :
          readout_t = readout.eval(feed_dict = {s : [screen]})[0]
          action_index = np.argmax(readout_t)
        # (1)Processing to Mario on the screen(self.env.step)
        obs, reward, is_finished, info = self.env.step(action_list[action_index])
        ##In an array to make it a Mini Batch
        action_array = np.zeros(len(action_list))
        action_array[action_index] = 1
        actions.append(action_array)
        # (2)Give a reward
        rewards.append([float(info['distance'])])
        images.append(screen)

        train_step.run(feed_dict = {
          a : actions, y : rewards, s : images
        })
        print('Episode : ', episode, 'Actions : ', action_list[action_index], 'Rewards', reward)
        actions, rewards, images = [], [] ,[]
        
        self.env.render()
      saver.save(sess, 'saved_networks/model-dqn', global_step = episode)

if __name__ == '__main__' :
  game = Game()
  game.play_game()
  

Run


$ python3 start.py

Finally

Docker + browser version will be written separately.

Recently, many technical books such as AI have been published, but I feel that the hurdles are quite high ... I think it's a good idea to practice with something familiar like this!

A separate study session will be held for details on the program for GameAI and Python. Please join us if you like,

Tech Twitter started. We will update it from time to time, so please follow us if you like. https://twitter.com/gauss_club

Recommended Posts

[How to!] Learn and play Super Mario with Tensorflow !!
How to share folders with Docker and Windows with tensorflow
How to loop and play gif video with openCV
[Python] How to play with class variables with decorator and metaclass
How to learn TensorFlow for liberal arts and Python beginners
Fractal to make and play with Python
How to process camera images with Teams and Zoom Sentiment analysis with Tensorflow
How to Learn Kaldi with the JUST Corpus
I tried to implement and learn DCGAN with PyTorch
How to extract null values and non-null values with pandas
Python # How to check type and type for super beginners
I tried to implement Grad-CAM with keras and tensorflow
How to learn structured SVM of ChainCRF with PyStruct
[TF] How to save and load Tensorflow learning parameters
How to update with SQLAlchemy?
How to cast with Theano
How to Alter with SQLAlchemy?
How to separate strings with','
How to RDP with Fedora31
How to Delete with SQLAlchemy?
[TensorFlow 2 / Keras] How to run learning with CTC Loss in Keras
How to do Bulk Update with PyMySQL and notes [Python]
[Let's play with Python] Image processing to monochrome and dots
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
How to create dataframes and mess with elements in pandas
How to log in to AtCoder with Python and submit automatically
How to set a shortcut to switch full-width and half-width with IBus
How to cancel RT with tweepy
Python: How to use async with
[Hugo] Summary of how to add pages to sites built with Learn
How to install and use Tesseract-OCR
Play with Poincare series and SymPy
[TensorFlow 2] Learn RNN with CTC Loss
How to import CSV and TSV files into SQLite with Python
How to use virtualenv with PowerShell
How to deal with imbalanced data
How to install python-pip with ubuntu20.04LTS
How to deal with errors when installing whitenoise and deploying to Heroku
How to deal with imbalanced data
How to run Jupyter and Spark on Mac with minimal settings
How to install pandas on EC2 (How to deal with MemoryError and PermissionError)
How to get started with Scrapy
How to get started with Python
How to install TensorFlow on CentOS 7
How to deal with DistributionNotFound errors
How to install and configure blackbird
How to use .bash_profile and .bashrc
How to get started with Django
How to install CUDA and nvidia-driver
How to install and use Graphviz
How to deal with errors when installing Python and pip with choco
How to Data Augmentation with PyTorch
Explain how to use TensorFlow 2.X with implementation of VGG16 / ResNet50
How to convert Tensorflow model to Lite
How to use FTP with Python
Let's try TensorFlow music generation project "Magenta" from development environment setting to song generation.
Learn Wasserstein GAN with Keras model and TensorFlow optimization
[How to!] Learn and play Super Mario with Tensorflow !!
How to calculate date with python
How to install mysql-connector with pip3
How to INNER JOIN with SQLAlchemy
[ROS2] How to play a bag file with python format launch
How to use Python with Jw_cad (Part 2 Command explanation and operation)