Python + Unity Reinforcement Learning (Learning)

After the last environment construction, we will actually perform reinforcement learning with Unity! Click here for the previous article, https://qiita.com/nol_miryuu/items/32cda0f5b7172197bb09

Prerequisites

Some basic knowledge of Unity is required (how to create and name objects)

Purpose

Train AI so that the blue sphere (AI Agent) can quickly approach the yellow box (Target) without falling off the floor

Constitution

State: Vector Observation (size = 8) ・ Three X, Y, Z coordinates of Target ・ Three of RollerAgent's X, Y, Z coordinates -RollerAgent's X and Z speeds (excluded because it does not move in the Y direction)

Action: Continuous (size = 2) ・ 0: The force applied to the Roller Agent in the X direction ・ 1: The force applied to the Roller Agent in the Z direction

Reward: ・ When the Roller Agent approaches the Target (the distance between the Roller Agent and the Target approaches 0), a reward (+1.0) is given and the episode is completed. -If the Roller Agent falls off the floor (when the Roller Agent's position in the Y direction is less than 0), the episode is completed without rewarding.

Decision: ・ Every 10 steps

Reinforcement learning cycle (process executed step by step) Status acquisition → Action decision → Action execution and reward acquisition → Policy update

Preparing the learning environment

1. Place the blue sphere, name = RollerAgent

スクリーンショット 2020-09-17 130159.png

2. Place the yellow box Name = Target

スクリーンショット 2020-09-17 130710.png

3. Place the floor name = Floor

スクリーンショット 2020-09-17 130728.png

4.Main Camera: Set the position and angle of the camera as shown by the red circle (to adjust the position so that the whole can be seen well).

スクリーンショット 2020-09-17 131155.png

5. Create Material to color each object (Asset> create)

スクリーンショット 2020-09-17 131256.png

6. Select Package Manager from Window in the menu (Import ML-Agent)

スクリーンショット 2020-09-17 131422.png

Press the "+" button in the upper left and select Add package from disk

スクリーンショット 2020-09-17 131435.png

Go to the directory you created last time and select ml-agents / com.unity.ml-agents / package.json

スクリーンショット 2020-09-17 131629.png

7. Add a component (feature) to the RollerAgent (blue sphere)

・ Rigidbody: Mechanism of physics simulation ・ Behavior Parameters: Set roller agent status and behavior data ・ Decision Requester: Set how many steps to request "decision" `` Basically the steps are executed every 0.02 seconds. If the Decision Period is "5", then every 5 x 0.02 = 0.1 seconds, In the case of "10", "decision" is executed every 10 x 0.02 = 0.2 seconds. Finally, set as shown in the figure below for the Roller Agent.

Rigidbody スクリーンショット 2020-09-17 140104.png

Behavior Paramenters -Behavior Name: RollerBall (model is generated with this name) ・ Space Size of Vector Observation: 8 (Type of observation state) ・ Space Type: Continuous (type of action) ・ Space Size of Vector Action: 2 (type of action) スクリーンショット 2020-09-17 140130.png

Decision Requester スクリーンショット 2020-09-17 140311.png

8. Create RollerAgents.cs script

・ Void Initialize () ・・・ Called only once when the agent game object is created ・ OnEpisodeBegin () ・・・ Called at the beginning of the episode ・ CollectObservations (Vector Sensor sensor) ・・・ Set the status data to be passed to the agent. ・ OnActionReceived (float [] vactorAction) ・・・ Executes the determined action, obtains the reward, and completes the episode.

`RollerAction`


using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

public class RollerAgent : Agent
{
    public Transform target;
    Rigidbody rBody;

    public override void Initialize()
    {
        this.rBody = GetComponent<Rigidbody>();
    }

    //Called at the beginning of the episode
    public override void OnEpisodeBegin()
    {
        if (this.transform.position.y < 0) //Reset the following when the RollerAgent (sphere) is falling off the floor
        {
            this.rBody.angularVelocity = Vector3.zero; //Reset rotational acceleration
            this.rBody.velocity = Vector3.zero;         //Reset speed
            this.transform.position = new Vector3(0.0f, 0.5f, 0.0f); //Reset position
        }
        //Reset the position of the Target (cube)
        target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);
    }

    //Set the observation data (8 items) to be passed to the agent
    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(target.position); //XYZ coordinates of Target (cube)
        sensor.AddObservation(this.transform.position); //Roller Agent XYZ coordinates
        sensor.AddObservation(rBody.velocity.x); //Roller Agent X-axis velocity
        sensor.AddObservation(rBody.velocity.z); //Roller Agent Z-axis velocity
    }

    //Called when performing an action
    public override void OnActionReceived(float[] vectorAction)
    {
        //Power the Roller Agent
        Vector3 controlSignal = Vector3.zero;

        controlSignal.x = vectorAction[0]; //Set behavior data determined by policy
                                                              // vectorAction[0]Is the force applied in the X direction(-1.0 〜 +1.0)
        controlSignal.z = vectorAction[1]; //Set behavior data determined by policy
                                                              // vectorAction[1]Is the force applied in the Y direction(-1.0 〜 +1.0)

        rBody.AddForce(controlSignal * 10);

        //Measure the distance between Roller Agent and Target
        float distanceToTarget = Vector3.Distance(this.transform.position, target.position);

        //When the Roller Agent arrives at the Target position
        if(distanceToTarget < 1.42f)
        {
            AddReward(1.0f); //Give a reward
            EndEpisode(); //Complete the episode
        }

        //When the Roller Agent falls off the floor
        if(this.transform.position.y < 0)
        {
            EndEpisode(); //Complete the episode without rewarding
        }
    }
}

9. Set RollerAgent properties

Max Step: The episode is completed when the maximum number of steps in the episode and the number of steps in the episode exceed the set values. Select Max Step 1000 and the yellow box "Target" in the Target field.

10. Creating a high parameter configuration file

-Create a sample directory in ml-agents / config / -Create a RollerBall.yaml file in it, the file contents are as follows

Hyperparameters (training configuration file extension .yaml [read as yamuru]) --Parameters used for learning --Human needs to adjust --Setting items are different for each reinforcement learning algorithm (PPO / SAC)

`RollerBall.yaml`


behaviors:
  RollerBall:
    trainer_type: ppo
    summary_freq: 1000
    hyperparameters:
      batch_size: 10
      buffer_size: 100
      learning_rate: 0.0003
      learning_rate_schedule: linear
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
    network_settings:
      normalize: true
      hidden_units: 128
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5

Start learning Roller Agent

Run the virtual environment created in the previous Qitta

`terminal`


poetry shell

Execute the following command in the ml-agents directory

`terminal`


mlagents-learn config/sample/RollerBall.yaml --run-id=model01

The last model01 is given an alias for each new training

`terminal`


Start training by pressing the Play button in the in the Unity Editor.

When the above code is written in terminal, go back to Unity and press the play button to execute it

1. Press the Unity play button to start training.

Information is displayed on the terminal every 50000 Steps. Mean Reward: Average reward points ... The higher the value, the higher the accuracy. When it reaches 1.0, let's finish the training.