[PYTHON] Reinforcement learning 37 Make an automatic start with Atari's wrapper

It is intended for AI beginners from junior high school students to university students.

When trying to use OpenAI's ATARI, I want a wrapper to cover the peculiarities of the game. For example, in Breakout, the life point is initially set to 5, but the game will not work unless you press the fire button at the beginning and when the life point decreases (when it fails). If you do not exceed this, it will be like a fool that machine learning will not progress at all. Personally, it's a waste of time. In the first place, a wrapper called FireResetEnv is prepared, but if you break a block, it will not move every time your life decreases, so if you fail once, it will remain as it is. So I wrote a wrapper like the one below.

class FireResetEnvAuto(gym.Wrapper):
    def __init__(self, env):
        """Take action on reset for envs that are fixed until firing."""
        gym.Wrapper.__init__(self, env)
        assert env.unwrapped.get_action_meanings()[1] == 'FIRE'
        assert len(env.unwrapped.get_action_meanings()) >= 3
        self.lives = 0

    def reset(self, **kwargs):
        self.env.reset(**kwargs)
        self.lives = self.env.unwrapped.ale.lives()
        obs, _, done, info = self.env.step(1)
        if done or info.get('needs_reset', False):
            self.env.reset(**kwargs)
        obs, _, done, info = self.env.step(2)
        if done or info.get('needs_reset', False):
            self.env.reset(**kwargs)
        return obs

    def step(self, ac):
        lives = self.env.unwrapped.ale.lives()
        if lives<self.lives:
            self.lives=lives
            return self.env.step(1)
        return self.env.step(ac)

Most are chainerrl / wrapper / atari_wrapper.py I just added a little. This is chokozainerrl.