[PYTHON] Reinforcement learning 27 colaboratory 90-minute rule measures chainerRL (+ chokozainerRL)

It is assumed that reinforcement learning 26 has been completed. It is intended for AI beginners from junior high school to university.

google colaboratory is free and can be gpgpu, but it seems that the continuous use time is limited. I haven't confirmed it because I couldn't find it in writing. In fact, it can become unusable during use, so long-term machine learning requires half-way backup and half-way restart.

Well, chainerRL has that mechanism, but as of December 3rd, it can't normally be installed with pip. It's on github, so I think it will be released soon. If you want to use the latest version

!pip install git+https://github.com/chainer/chainerrl

You can install it with. This is a hassle, so I copied only the relevant parts to chokozainerrl.

chokozainerrl is set in args. Set a savepoint with checkpoint_freq. For example, if it is 1000, it will save in 1000, 2000, 3000 and 1000 increments. The saved agent will be in a folder with a name like 3000_checkpoint. Also, if you want to start from the middle, use step_offset. If you want to start from 3000 steps, set step_offset = 3000. Then, specify the agent folder learned in 3000 steps as load_agent. If you use these two well, you can be interrupted in the middle.

chokozainer's github is here. https://github.com/chokozainer/chokozainerrl