Python + Unity Reinforcement Learning (Learning)

After the last environment construction, we will actually perform reinforcement learning with Unity! Click here for the previous article,


Some basic knowledge of Unity is required (how to create and name objects)


Train AI so that the blue sphere (AI Agent) can quickly approach the yellow box (Target) without falling off the floor


State: Vector Observation (size = 8) ・ Three X, Y, Z coordinates of Target ・ Three of RollerAgent's X, Y, Z coordinates -RollerAgent's X and Z speeds (excluded because it does not move in the Y direction)

Action: Continuous (size = 2) ・ 0: The force applied to the Roller Agent in the X direction ・ 1: The force applied to the Roller Agent in the Z direction

Reward: ・ When the Roller Agent approaches the Target (the distance between the Roller Agent and the Target approaches 0), a reward (+1.0) is given and the episode is completed. -If the Roller Agent falls off the floor (when the Roller Agent's position in the Y direction is less than 0), the episode is completed without rewarding.

Decision: ・ Every 10 steps

Reinforcement learning cycle (process executed step by step) Status acquisition → Action decision → Action execution and reward acquisition → Policy update

Preparing the learning environment

1. Place the blue sphere, name = RollerAgent

スクリーンショット 2020-09-17 130159.png

2. Place the yellow box Name = Target

スクリーンショット 2020-09-17 130710.png

3. Place the floor name = Floor

スクリーンショット 2020-09-17 130728.png

4.Main Camera: Set the position and angle of the camera as shown by the red circle (to adjust the position so that the whole can be seen well).

スクリーンショット 2020-09-17 131155.png

5. Create Material to color each object (Asset> create)

スクリーンショット 2020-09-17 131256.png

6. Select Package Manager from Window in the menu (Import ML-Agent)

スクリーンショット 2020-09-17 131422.png

Press the "+" button in the upper left and select Add package from disk

スクリーンショット 2020-09-17 131435.png

Go to the directory you created last time and select ml-agents / / package.json

スクリーンショット 2020-09-17 131629.png

7. Add a component (feature) to the RollerAgent (blue sphere)

・ Rigidbody: Mechanism of physics simulation ・ Behavior Parameters: Set roller agent status and behavior data ・ Decision Requester: Set how many steps to request "decision" `` Basically the steps are executed every 0.02 seconds. If the Decision Period is "5", then every 5 x 0.02 = 0.1 seconds, In the case of "10", "decision" is executed every 10 x 0.02 = 0.2 seconds. Finally, set as shown in the figure below for the Roller Agent.

Rigidbody スクリーンショット 2020-09-17 140104.png

Behavior Paramenters -Behavior Name: RollerBall (model is generated with this name) ・ Space Size of Vector Observation: 8 (Type of observation state) ・ Space Type: Continuous (type of action) ・ Space Size of Vector Action: 2 (type of action) スクリーンショット 2020-09-17 140130.png

Decision Requester スクリーンショット 2020-09-17 140311.png

8. Create RollerAgents.cs script

・ Void Initialize () ・ ・ ・ Called only once when the agent game object is created ・ OnEpisodeBegin () ・ ・ ・ Called at the beginning of the episode ・ CollectObservations (Vector Sensor sensor) ・ ・ ・ Set the status data to be passed to the agent. ・ OnActionReceived (float [] vactorAction) ・ ・ ・ Executes the determined action, obtains the reward, and completes the episode.


using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

public class RollerAgent : Agent
    public Transform target;
    Rigidbody rBody;

    public override void Initialize()
        this.rBody = GetComponent<Rigidbody>();

    //Called at the beginning of the episode
    public override void OnEpisodeBegin()
        if (this.transform.position.y < 0) //Reset the following when the RollerAgent (sphere) is falling off the floor
            this.rBody.angularVelocity =; //Reset rotational acceleration
            this.rBody.velocity =;         //Reset speed
            this.transform.position = new Vector3(0.0f, 0.5f, 0.0f); //Reset position
        //Reset the position of the Target (cube)
        target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);

    //Set the observation data (8 items) to be passed to the agent
    public override void CollectObservations(VectorSensor sensor)
        sensor.AddObservation(target.position); //XYZ coordinates of Target (cube)
        sensor.AddObservation(this.transform.position); //Roller Agent XYZ coordinates
        sensor.AddObservation(rBody.velocity.x); //Roller Agent X-axis velocity
        sensor.AddObservation(rBody.velocity.z); //Roller Agent Z-axis velocity

    //Called when performing an action
    public override void OnActionReceived(float[] vectorAction)
        //Power the Roller Agent
        Vector3 controlSignal =;

        controlSignal.x = vectorAction[0]; //Set behavior data determined by policy
                                                              // vectorAction[0]Is the force applied in the X direction(-1.0 〜 +1.0)
        controlSignal.z = vectorAction[1]; //Set behavior data determined by policy
                                                              // vectorAction[1]Is the force applied in the Y direction(-1.0 〜 +1.0)

        rBody.AddForce(controlSignal * 10);

        //Measure the distance between Roller Agent and Target
        float distanceToTarget = Vector3.Distance(this.transform.position, target.position);

        //When the Roller Agent arrives at the Target position
        if(distanceToTarget < 1.42f)
            AddReward(1.0f); //Give a reward
            EndEpisode(); //Complete the episode

        //When the Roller Agent falls off the floor
        if(this.transform.position.y < 0)
            EndEpisode(); //Complete the episode without rewarding

9. Set RollerAgent properties

Max Step: The episode is completed when the maximum number of steps in the episode and the number of steps in the episode exceed the set values. Select Max Step 1000 and the yellow box "Target" in the Target field.

10. Creating a high parameter configuration file

-Create a sample directory in ml-agents / config / -Create a RollerBall.yaml file in it, the file contents are as follows

Hyperparameters (training configuration file extension .yaml [read as yamuru]) --Parameters used for learning --Human needs to adjust --Setting items are different for each reinforcement learning algorithm (PPO / SAC)


    trainer_type: ppo
    summary_freq: 1000
      batch_size: 10
      buffer_size: 100
      learning_rate: 0.0003
      learning_rate_schedule: linear
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      normalize: true
      hidden_units: 128
      num_layers: 2
      vis_encode_type: simple
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5

Start learning Roller Agent

Run the virtual environment created in the previous Qitta


poetry shell

Execute the following command in the ml-agents directory


mlagents-learn config/sample/RollerBall.yaml --run-id=model01

The last model01 is given an alias for each new training


Start training by pressing the Play button in the in the Unity Editor.

When the above code is written in terminal, go back to Unity and press the play button to execute it

1. Press the Unity play button to start training.

Information is displayed on the terminal every 50000 Steps. Mean Reward: Average reward points ... The higher the value, the higher the accuracy. When it reaches 1.0, let's finish the training.

Recommended Posts

Python + Unity Reinforcement Learning (Learning)
Python + Unity Reinforcement learning environment construction
Reinforcement learning 1 Python installation
Reinforcement learning starting with Python
python learning
[Introduction] Reinforcement learning
[Python] Learning Note 1
Python learning notes
python learning output
Python learning site
Python learning day 4
Future reinforcement learning_2
Future reinforcement learning_1
Python Deep Learning
Python learning (supplement)
Deep learning × Python
python learning notes
[Python] Easy Reinforcement Learning (DQN) with Keras-RL
Python class (Python learning memo ⑦)
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Python module (Python learning memo ④)
Learning Python with ChemTHEATER 05-1
Reinforcement learning 3 OpenAI installation
Python ~ Grammar speed learning ~
Python: Unsupervised Learning: Basics
Reinforcement learning for tic-tac-toe
Private Python learning procedure
Learning Python with ChemTHEATER 02
[Reinforcement learning] Bandit task
Learning Python with ChemTHEATER 01
Python: Deep Learning Tuning
Python: Supervised Learning (Regression)
Reinforcement learning 1 introductory edition
Python: Supervised Learning (Classification)
Effective Python Learning Memorandum Day 15 [15/100]
Reinforcement learning 18 Colaboratory + Acrobat + ChainerRL
Python exception handling (Python learning memo ⑥)
Reinforcement learning 7 Learning data log output
O'Reilly python3 Primer Learning Notes
Reinforcement learning 17 Colaboratory + CartPole + ChainerRL
Effective Python Learning Memorandum Day 6 [6/100]
Reinforcement learning 28 colaboratory + OpenAI + chainerRL
Effective Python Learning Memorandum Day 12 [12/100]
Python: Supervised Learning: Hyperparameters Part 1
Effective Python Learning Memorandum Day 9 [9/100]
Reinforcement learning 2 Installation of chainerrl
Effective Python Learning Memorandum Day 8 [8/100]
[Reinforcement learning] Tracking by multi-agent
Reinforcement learning 20 Colaboratory + Pendulum + ChainerRL
Machine learning with Python! Preparation
Reinforcement learning 5 Try programming CartPole?
Reinforcement learning 9 ChainerRL magic remodeling
Reinforcement learning Learn from today
Python data analysis learning notes
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Python Machine Learning Programming> Keywords
Python: Supervised Learning: Hyperparameters Part 2
Effective Python Learning Memorandum Day 13 [13/100]
Effective Python Learning Memorandum Day 3 [3/100]