[PYTHON] Deep reinforcement learning 2 Implementation of reinforcement learning

Aidemy 2020/11/22

Introduction

Hello, it is Yope! I'm a crunchy literary school, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the second post of deep reinforcement learning. Nice to meet you.

What to learn this time ・ Implementation of reinforcement learning

Implementation of reinforcement learning

Environment creation

-In the Chapter of "Reinforcement Learning", the environment etc. was defined by myself, but this time we will create the environment etc. using the library that prepares various environments for reinforcement learning. -The library to be used is __ "keras-rl" __, which consists of Keras and __OpenAIGym (Gym) __. This time, we will use this to learn Cartpole demo with DQN.

・ First, create environment. The method is just __ "env = gym.make ()" __. Specify the type of environment in the argument. The environment of the cart pole is specified as __ "" CartPole-v0 "" __. After that, it can be operated by accessing the env instance. -In this cart pole, there are two actions, __ "move the cart to the right" and "move the cart to the left" __, and to get this __ "env.action_space.n" You can do it with __.

-Code![Screenshot 2020-11-19 10.31.55.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/d10ad3fb-5063-56c5- aaf1-453352008e27.png)

Model building

-Once the environment is created, use Keras functions to build a __multilayer neural network __. The model is built with the Sequential model. Next, __Flatten () __ transforms a multidimensional input into one dimension. For the input shape "input_shape", use __ "env.observation_space.shape" __ to specify the current state of the cart pole. -Add layers with __ "model.add ()" __. The fully connected layer is Dense, and the activation function is specified by Activation (). Specify "relu", "linear", etc. as arguments.

-Code![Screenshot 2020-11-19 10.32.22.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/a8af8cb2-4c1f-3956- 62f8-26e9086e06ff.png)

Agent settings 1 History and measures

-Here, the __agent setting __, which is the main body of reinforcement learning, is performed. First, set the __history __ and policy __ required for this setting. (History is "History of what you did in the past") - History __ can be set with __ "Sequential Memory (limit, window_length)" __. limit is the number of memories to store. -For policy, use __ "BoltzmannQPolicy ()" __ when taking the Boltzmann policy, and use __ "EpsGreedyQPolicy ()" __ when taking the ε-greedy method.

-Code![Screenshot 2020-11-19 10.41.15.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/b834e4f7-d691-531d- e28c-98eda7808306.png)

Agent settings 2

-Set up an agent using the history and measures in the previous section. Call __ "DQNAgent ()" __ that implements the DQN algorithm and give the following arguments. -In the arguments, model __ "model" __, history __ "memory" __, policy __ "policy" __, number of actions __ "nb_actions" __, how many steps at the beginning are not used for reinforcement learning Set the specified __ "nb_steps_warmup" __. -If you put the above in a variable called "dqn", specify the agent learning method with __ "dqn.compile ()" __. The __optimization function __ can be specified as the first argument, and the evaluation function represented by metrics can be specified as the second argument.

・ Code (use the one in the previous section for model etc.)![Screenshot 2020-11-19 11.12.57.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/ 0/698700 / ff94997e-3e16-3d04-961d-1f9fd389a025.png)

Conducting the test

-To train the dqn agent in the previous section, use __ "dqn.fit ()" __. Arguments are environment (env in the code), number of episodes __ "nb_steps" __, whether to visualize __ "visualize" __, and whether to output logs __ "verbose" __ is there. -If you let the agent learn, do this test. The test runs the agent and evaluates how much reward it actually gets. This can be done with __ "dqn.test ()" __. The argument is the same as "dqn.fit ()", but only the number of episodes is __ "nb_episodes" __.

Summary

-Use keras-rl when performing reinforcement learning using the library. -After creating the environment with __ "gym.make ()" __, create a model and add layers. Also, set __history __ and __policy __ and use these to create an agent. -Train the created dqn agent with __ "dqn.fit ()" __ and test it with __ "dqn.test ()" __.

This time is over. Thank you for reading this far.

Recommended Posts

Deep reinforcement learning 2 Implementation of reinforcement learning
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Deep learning 1 Practice of deep learning
[Reinforcement learning] Easy high-speed implementation of Ape-X!
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Implementation of Deep Learning model for image recognition
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
Reinforcement learning 2 Installation of chainerrl
Deep running 2 Tuning of deep learning
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
Deep Learning
Deep learning learned by implementation 1 (regression)
Deep Reinforcement Learning 3 Practical Edition: Breakout
Deep learning image recognition 2 model implementation
Learn while making! Deep reinforcement learning_1
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
[Reinforcement learning] Explanation and implementation of Ape-X in Keras (failure)
Deep learning learned by implementation 2 (image classification)
Reinforcement learning to learn from zero to deep
Meaning of deep learning models and parameters
Qiskit: Implementation of Quantum Circuit Learning (QCL)
Implementation of 3-layer neural network (no learning)
<Course> Deep Learning Day4 Reinforcement Learning / Tensor Flow
Deep Learning Memorandum
Start Deep learning
Try deep learning of genomics with Kipoi
Machine learning algorithm (implementation of multi-class classification)
Visualize the effects of deep learning / regularization
Future reinforcement learning_2
Future reinforcement learning_1
Sentiment analysis of tweets with deep learning
Python Deep Learning
Try to make a blackjack strategy by reinforcement learning ((1) Implementation of blackjack)
Learning record of reading "Deep Learning from scratch"
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
The story of doing deep learning with TPU
[Reinforcement learning] R2D2 implementation / explanation revenge commentary (Keras-RL)
See the behavior of drunkenness with reinforcement learning
Deep learning / error back propagation of sigmoid function
A memorandum of studying and implementing deep learning
Deep learning learned by implementation ~ Anomaly detection (unsupervised learning) ~
Basic understanding of stereo depth estimation (Deep Learning)
Parallel learning of deep learning by Keras and Kubernetes
First deep learning in C #-Imitating implementation in Python-
First Deep Learning ~ Struggle ~
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Reinforcement learning 1 Python installation
Reinforcement learning 3 OpenAI installation
Python: Deep Learning Practices
Deep learning / activation functions
Deep Learning from scratch
Reinforcement learning for tic-tac-toe
Deep learning / cross entropy
First Deep Learning ~ Preparation ~
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
First Deep Learning ~ Solution ~
[AI] Deep Metric Learning
Implementation of Fibonacci sequence
I tried deep learning