I posted it because it worked well when I moved CartPole-v0 of OpenAI Gym with PD control. The usage is too different, but I think it's enough as a demo to learn how to use it.
The control algorithm is PD control with positive and negative claps (ie -1 or +1).
The code is below. All the detailed explanations are written in the code. It is also listed on github.
cart_pole_pd.py
# coding : utf-8
#CartPole using PD control agent clapped positive and negative-Demonstration of v0
#Value meaning
# observation : [Cart position(Unit unknown),Cart speed(Unit unknown),Pole angle(Radian),Angular velocity of pole(Unit unknown)]
# action : 0->-1, 1->+1
# step : 100step =1 second
#Exit conditions:Tilt more than 15 degrees or 2.4(Unit unknown)Move from the center or 200 steps
import agents
import gym
from gym import wrappers
video_path = './video' #Path to save the video
n_episode = 1 #Number of episodes
n_step = 200 #Number of steps
#PD control parameters(By the way, P control alone does not work)
#* Since it is clapped in positive and negative, only the ratio is meaningful.
kp = 0.1
kd = 0.01
myagent = agents.PDAgent(kp, kd) #PD control agent clapped by positive and negative
env = gym.make('CartPole-v0') #Environment creation
#Wrap class for environments that store videos in a specified directory
# force=True: Automatically clear previous monitor files
env = wrappers.Monitor(env, video_path, force=True)
for i_episode in range(n_episode):
observation = env.reset() #Environmental initialization & initial observation acquisition
for t in range(n_step):
env.render() #Show environment(But it is displayed without using Monitor)
print(observation)
action = myagent.action(observation) #Get action from agent class
observation, reward, done, info = env.step(action) #Take one step
if done: #End flag
print('Episode finished after {} timesteps'.format(t+1))
break
agents.py
# coding: utf-8
import random
#Random agent
class RandomAgent():
def action(self):
return random.randint(0, 1) #Returns an integer random number in the range 0 or more and 1 or less
#PD control agent clapped by positive and negative
class PDAgent():
def __init__(self, kp, kd):
self.kp = kp
self.kd = kd
def action(self, observation):
m = self.kp * observation[2] + self.kd * observation[3] #Calculate the amount of operation
if m >= 0:
return 1
if m < 0:
return 0
Recommended Posts