[PYTHON] Try with Chainer Deep Q Learning --Launch

Hello core business is the people not a programmer.

When the keyword "deep learning" is buzzing on TV See Robot Control with Distributed Deep Reinforcement Learning | Preferred Research I wanted to try it, so I didn't call it a clone, but made a simple one first. ⇒ Storage: DeepQNetworkTest

the aim

Python for the first time! Chainer is also the first time! I don't even know how to program, but there are no software shops around! But I want to let the self-propelled machine do reinforcement learning! ⇒ If you publish it for the time being, some people may tell you

There are really few cases (I feel) of moving a machine that has inertia or something. ⇒Let's put it in the next step

What i did

-Clone something like ConvNetJS Deep Q Learning Reinforcement Learning with Neural Network demo --I made a GUI with Python! --I implemented Deep Q Learning (e-greedy)! (It's closer to copying sutras, and it's not deep at all) demo.png

Program overview

environment

Red apples and poison rings are lined up in a garden surrounded by an outer frame and an inner frame. Artificial intelligence wants to eat a lot of red apples and doesn't want to eat poisoned apples.

Outer frame and inner wall

An obstacle that blocks the movement and field of vision of Artificial Intelligence. Artificial intelligence likes to have an open view. 001.png

Red apple / poisoned apple

When you hit a red apple, you will be rewarded. Poisoned apples are punishable. aka.pngdoku.png

Artificial intelligence

A blue dot with a 300px field of view 120 ° forward. --The field of view consists of nine eyes, blocked by apples and walls (only the closest one can be seen). --Continue to move at a constant speed ――The actions you can take are turn right, turn right, go straight, turn left, turn left. ――I want to go straight when the field of view is wide agent.png

Reinforcement learning

Neural net

I'm using Relu with 59 inputs, 50 hidden layers x 2 and 5 outputs (as original) network.png

Learning

Mini-batch learning by stocking 30,000 experiences. I often see how to learn. I haven't done anything fashionable, such as using Double DQN or LSTM.

What I was able to do / strange place

――Artificial intelligence learns little by little and begins to eat red apples ――I like to stick to the wall strangely ――It seems that poisoned apples are also actively going to eat, but is ε-greedy ε out?

It may be necessary to adjust the action as a reward. It must be that the progress of learning is not illustrated.

I want you to tell me and help me! 2016/04/22

How to use Numpy

Chainer memo 11 When the speed does not come out on GPU --studylog / Northern clouds

For those who usually use numpy for crunching, this code is not possible at the level of blowing tea, but until a while ago I was often mixed with such a code.

There is a thing, but this article itself is about cupy, Even if I limit it to numpy, I don't know how to do it, so Isn't this strange about how to write that it will be faster? I would like to know if there is any.

DQN001.py


        memsize     = self.eMem.shape[0]
        batch_index = np.random.permutation(memsize)[:self.batch_num]
        batch       = np.array(self.eMem[batch_index], dtype=np.float32).reshape(self.batch_num, -1)

        x = Variable(batch[:,0:STATE_DIM])
        targets = self.model.predict(x).data.copy()

        for i in range(self.batch_num):
            #[ state..., action, reward, seq_new]
            a = int(batch[i,STATE_DIM])
            r = batch[i, STATE_DIM+1]

            new_seq= batch[i,(STATE_DIM+2):(STATE_DIM*2+2)]

            targets[i,a]=( r + self.gamma * np.max(self.get_action_value(new_seq)))

        t = Variable(np.array(targets, dtype=np.float32).reshape((self.batch_num,-1))) 

Should we consider an implementation that can convert the inside of a for loop into a vector operation?

How to use wxPython

I'm not sure about the parent-child relationship between Frame and Panel and how to handle the device context (dc). I want to add a graph at the bottom of the screen (waiting for construction) ⇒ wxPython: Simultaneous drawing of animation and graph drawing --Qiita

voglio001.png

After that: distributed learning and graphs were added

It looks like fireflies are flying. 003.gif

reference

Caution

This article will be added or rewritten little by little

Recommended Posts

Try with Chainer Deep Q Learning --Launch
Try deep learning with TensorFlow
Try Deep Learning with FPGA
Try Deep Learning with FPGA-Select Cucumbers
Try deep learning with TensorFlow Part 2
Try Common Representation Learning with chainer
Classify anime faces with deep learning with Chainer
Try Bitcoin Price Forecasting with Deep Learning
Try deep learning of genomics with Kipoi
Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-
Try implementing RBM with chainer.
Deep Kernel Learning with Pyro
Deep Embedded Clustering with Chainer 2.0
Try machine learning with Kaggle
Generate Pokemon with Deep Learning
Now, let's try face recognition with Chainer (learning phase)
Try to build a deep learning / neural network with scratch
[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
Cat breed identification with deep learning
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Make ASCII art with deep learning
Try horse racing prediction with Chainer
[Chainer] Learning XOR with multi-layer perceptron
Solve three-dimensional PDEs with deep learning.
Try machine learning with scikit-learn SVM
Reinforcement learning 8 Try using Chainer UI
Check squat forms with deep learning
Categorize news articles with deep learning
Forecasting Snack Sales with Deep Learning
Make people smile with Deep Learning
(python) Deep Learning Library Chainer Basics Basics
Deep Learning
DQN with Chainer. I tried various reinforcement learning in tic-tac-toe. (Deep Q Network, Q-Learning, Monte Carlo)
Deep learning / Deep learning from scratch 2-Try moving GRU
Sentiment analysis of tweets with deep learning
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
The story of doing deep learning with TPU
Chainer and deep learning learned by function approximation
Deep Learning Memorandum
Start Deep learning
99.78% accuracy with deep learning by recognizing handwritten hiragana
Python Deep Learning
Deep learning × Python
Seq2Seq (1) with chainer
I installed Chainer, a framework for deep learning
A story about predicting exchange rates with Deep Learning
I installed and used the Deep Learning library Chainer
Deep learning image analysis starting with Kaggle and Keras
Try to predict forex (FX) with non-deep machine learning
Now, let's try face recognition with Chainer (prediction phase)
Use scikit-learn training dataset with chainer (for learning / prediction)
Extract music features with Deep Learning and predict tags
Classify anime faces by sequel / deep learning with Keras
DNN (Deep Learning) Library: Comparison of chainer and TensorFlow (1)
Learn with an inverted pendulum DQN (Deep Q Network)
Try scraping with Python.
First Deep Learning ~ Struggle ~
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Learning Python with ChemTHEATER 05-1
Python: Deep Learning Practices