[PYTHON] OpenAI Gym to learn with PD-controlled Cart Pole

I posted it because it worked well when I moved CartPole-v0 of OpenAI Gym with PD control. The usage is too different, but I think it's enough as a demo to learn how to use it.

The control algorithm is PD control with positive and negative claps (ie -1 or +1).

The code is below. All the detailed explanations are written in the code. It is also listed on github.

`cart_pole_pd.py`


# coding : utf-8

#CartPole using PD control agent clapped positive and negative-Demonstration of v0

#Value meaning
#  observation : [Cart position(Unit unknown),Cart speed(Unit unknown),Pole angle(Radian),Angular velocity of pole(Unit unknown)]
#  action : 0->-1, 1->+1
#  step : 100step =1 second
#Exit conditions:Tilt more than 15 degrees or 2.4(Unit unknown)Move from the center or 200 steps

import agents
import gym
from gym import wrappers

video_path = './video' #Path to save the video
n_episode = 1 #Number of episodes
n_step = 200 #Number of steps
#PD control parameters(By the way, P control alone does not work)
#* Since it is clapped in positive and negative, only the ratio is meaningful.
kp = 0.1
kd = 0.01

myagent = agents.PDAgent(kp, kd) #PD control agent clapped by positive and negative
env = gym.make('CartPole-v0') #Environment creation
#Wrap class for environments that store videos in a specified directory
# force=True: Automatically clear previous monitor files
env = wrappers.Monitor(env, video_path, force=True)
for i_episode in range(n_episode):
    observation = env.reset() #Environmental initialization & initial observation acquisition
    for t in range(n_step):
        env.render() #Show environment(But it is displayed without using Monitor)
        print(observation)
        action = myagent.action(observation) #Get action from agent class
        observation, reward, done, info = env.step(action) #Take one step
        if done: #End flag
            print('Episode finished after {} timesteps'.format(t+1))
            break

`agents.py`


# coding: utf-8

import random

#Random agent
class RandomAgent():
    def action(self):
        return random.randint(0, 1) #Returns an integer random number in the range 0 or more and 1 or less

#PD control agent clapped by positive and negative
class PDAgent():
    def __init__(self, kp, kd):
        self.kp = kp
        self.kd = kd

    def action(self, observation):
        m = self.kp * observation[2] + self.kd * observation[3] #Calculate the amount of operation
        if m >= 0:
            return 1
        if m < 0:
            return 0