[ML-Aents] I tried machine learning using Unity and Python TensorFlow (v0.11β compatible)

Introduction

I couldn't find the Japanese article of v0.11.0, so I have a memorandum.

This article is __for beginners __. Unity beginners imitate one of the official tutorials on ML-Agents do it It is one of the machine learning, __ Reinforcement Learning __.

Rollerball.gif __ I will make something like this. : arrow_up: __

It's for people who haven't done machine learning yet, although they know how to make Unity work easily. Rather than focusing on theory, we are introducing it so that you can experience it while moving your hands.

* This article is current as of November 13, 2019. </ b> ML-Agents are undergoing rapid version upgrades, so always check for the latest information. ~~ [Book published last year](https://www.amazon.co.jp/Unity%E3%81%A7%E3%81%AF%E3%81%98%E3%82%81%E3%82 % 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 83% BB% E5% BC% B7% E5% 8C% 96% E5% AD% A6 % E7% BF% 92-Unity-ML-Agents% E5% AE% 9F% E8% B7% B5% E3% 82% B2% E3% 83% BC% E3% 83% A0% E3% 83% 97% E3 % 83% AD% E3% 82% B0% E3% 83% A9% E3% 83% 9F% E3% 83% B3% E3% 82% B0-% E5% B8% 83% E7% 95% 99% E5% B7% 9D-% E8% 8B% B1% E4% B8% 80 / dp / 48624648181) didn't help ~~ (Transition of this year ⇒ January 2019: * v0.6 * ➡ April: * v0.8 * ➡ October: * v0.10 * ➡ As of November: * v0.11 *)

Roughly the point

Here are some essential words for doing machine learning in Unity. That is __ "Academy", "Brain", and "Agent" __.

Basically, in the environment defined by "Academy" in Unity, "Brain" controls the actions taken by "Agent". This time, we will perform reinforcement learning via an external TensorFlow (Python framework), and load the generated neural network model in Unity and execute it. (This is a simple tutorial, so I won't touch the Academy very much.)

ML-Agents

Major changes from version 0.10.0

__ If you are new to this, you can skip it. __ I used v0.8x and v0.9x, but I'm not sure because I can't find Brain Parameters, but if you're just looking here, maybe it's okay.

-* Broadcast Hub * is abolished. -* Brain Scriptable Objects * is abolished. ⇒ Change to * Behavior Parameters * </ b> item -Major setup change of * Visual Observation *. --Renewed definition of gRPC. --Abolition of online BC training.

Execution environment

  • Windows10
  • Unity 2019.1.4f1
  • ML-Agents Beta 0.11.0
  • Python 3.6(Anaconda)

Preparation

Please install the following first.

- Unity5 </ b> (ver is 2017.4 migration, I think there is no problem)

Project creation

    1. Launch Unity and create a project called Roller Ball.
    • File-> Build Settings ...-> Player Settings ...-> Other Settings-> Configuration *
  • Scripting Runtime Version </ b> * and * Api Compatibility Level </ b> * are * .NET 4.x Equivalent </ b> * and * .NET, respectively. 4. Make sure it is x </ b> *. YHN.png
    1. Load ML-Agents assets into your project. Located in the downloaded ml-agents-master \ UnitySDK \ Assets
  • D & D * the ML-Agents folder into your project. ooop.png

Stage creation

Creating a floor

-* 3D Object> Plane * to create a plane. --Name the * Plane * you created as Floor. --* Transform * of Floor

  • Position = (0, 0, 0)
  • Rotation = (0, 0, 0)
  • Scale = (1, 1, 1) To -Play with * Element * of * Inpector *> * Materials * to make it look like you like. ppppp.png

Creating a box (Target)

-* 3D Object> Cube * to create a cube. --Name the created * Cube * to Target. --* Transform * of Target

  • Position = (3, 0.5, 3)
  • Rotation = (0, 0, 0)
  • Scale = (1, 1, 1) To --Similar to Floor, you can change the appearance to your liking. box.png

Creating a soccer ball (Agent)

-* 3D Object> Sphere * to put out a sphere. --Name the * Sphere * you created as RollerAgent. --* Transform * of RollerAgent

  • Position = (0, 0.5, 0)
  • Rotation = (0, 0, 0)
  • Scale = (1, 1, 1) To ――As before, change the appearance to your liking. If you want it to look like a ball, choose the CheckerSquare material. -Add * Rigidbody * from * Add Component *. kkkkk.png

Creating an empty object (Academy)

-* Create Empty * will bring out an empty * GameObject *. --Name the * GameObject * you created to ʻAcademy`. oooiiiii.png

Next, I will describe the contents in C #.

Implementation of Academy (Implement an Academy)

-With ʻAcademyselected in the * Hierarchy * window, use * Add Component-> New Script * to create a script namedRollerAcademy.cs. --Rewrite the contents of RollerAcademy.cs` to the following. You can erase the original contents.

RollerAcademy.cs


using MLAgents;
public class RollerAcademy : Academy{ }

In this description, Basic functions such as "observation-decision-action-action" (omitted here) are inherited from the * Academy * class to the * RollerAcademy * class. So it's okay with two lines.

Implementation an Agent

Select RollerAgent in the * Hierarchy * window and select Create a script named RollerAgent.cs with * Add Component-> New Script *.

Inheritance the * Base *

Rewrite the contents of RollerAgent.cs as follows.

RollerAgent.cs


using MLAgents;
public class RollerAgent : Agent{ }

Like * Academy *, it reads the namespace * MLAgents * and specifies * Agent * as the base class to inherit it.

This is the basic procedure for incorporating ML-agents into __Unity. Next, we will add a mechanism for the ball to charge toward the box by reinforcement learning.

Initialization and Resetting

Rewrite the contents of RollerAgent.cs as follows.

RollerAgent.cs


using unityEngine;
using MLAgents;

public class RollerAgent:Agent
{
    Rigidbody rBody;
    void Start(){
        rBody = GetComponent<Rigidbody>();
    }

    public Transform Target;
    public override void AgentReset()
    {
        if (this.transform.position.y < 0)
        {
            //Rotational acceleration and acceleration reset
            this.rBody.angularVelocity = Vector3.zero;
            this.rBody.velocity = Vector3.zero;
            //Return agent to initial position
            this.transform.position = new Vector3( 0, 0.5f, 0)
        }
        //Target relocation
        Target.position = new Vector3(Random.value * 8 - 4, 0.5f,
                                      Random.value * 8 - 4);
    }

}

here,

--Next __ relocation and initialization __ when RollerAgent reaches the box (Target) --__Return __ when RollerAgent falls off the floor (Floor)

Is being processed.

Rigidbody is a component used in Unity's physics simulation. This time it will be used to run the agent. The values of * Position, Rotation, Scale * are recorded in Transform. By defining it as public, * Inpector * can pass Transform of * Target *.

Observing the Environment

Add the following in the class of RollerAgent.cs.

public override void CollectObservations()
{
    //Target and agent location
    AddVectorObs(Target.position);
    AddvectorObs(This.transform.position);

    //Agent speed
    AddVectorObs(rBody.velocity.x);
    AddVectorObs(rBody.velocity.z);
}

here, __ Processing to collect observed data as a feature vector __ I am doing.

The 3D coordinates of * Target * and * Agent * and the total 8D vectors of * Agent * velocities * x * and * z * are passed to the neural network. ~~ 8 dimensions is cool to express ~~

Actions and Rewards

Add the following processing related to the ʻAgentAction ()function toRollerAgent.cs`.

public float speed = 10
public override void AgentAction(float[] vectorAction, string textAction)
{
    //Action
    Vector3 controlSignal = Vector3.zero;
    controlSignal.x = vectorAction[0];
    controlSignal.z = vectorAction[1];
    rBody.AddForce(controlSignal * speed);

    
    //Reward
    //Get the distance to the box (target) from the distance the ball (agent) moved
   float distanceToTarget = Vector3.Distance(this.transform.position, 
                                             Target.position);

    //When the box (target) is reached
    if (distanceToTarget < 1.42f)
    {
        //Rewarded and completed
        SetReward(1.0f);
        Done();
    }

    //If you fall off the floor
    if (this.transform.position.y < 0)
    {
        Done();
    }
}

here, __The "action" of reading the two types of forces (continuous values) applied in the X and Z directions and trying to move the agent The learning algorithm processes __ which gives a "reward" when the agent can reach the box safely and picks up the "reward" when it falls.

The ʻAddForce` function is a function for applying physical force to an object that has a * Rigidbody * component and moving it. Only when the distance below the reference value to judge whether the target has been reached is calculated, the reward will be given and the reset will be performed.

In order to get enough learning in more complicated situations, it is effective not only to take up the reward but also to punish it. ~~ (At v0,5x, it was -1 when it fell off the floor, but it seems that it was judged unnecessary in the latest version) ~~

In summary, RollerAgents.cs looks like this:

RollerAgents.cs


using unityEngine;
using MLAgents;

public class RollerAgent:Agent
{
    Rigidbody rBody;
    void Start(){
        rBody = GetComponent<Rigidbody>();
    }

    public Transform Target;
    public override void AgentReset()
    {
        if (this.transform.position.y < 0)
        {
            //Rotational acceleration and acceleration reset
            this.rBody.angularVelocity = Vector3.zero;
            this.rBody.velocity = Vector3.zero;
            //Return agent to initial position
            this.transform.position = new Vector3( 0, 0.5f, 0)
        }
        //Target relocation
        Target.position = new Vector3(Random.value * 8 - 4, 0.5f,
                                      Random.value * 8 - 4);
    }

    public override void CollectObservations()
    {
        //Target and agent location
        AddVectorObs(Target.position);
        AddvectorObs(This.transform.position);

        //Agent speed
        AddVectorObs(rBody.velocity.x);
        AddVectorObs(rBody.velocity.z);
    }

    public override void AgentAction(float[] vectorAction, string textAction)
    {
        //Action
        Vector3 controlSignal = Vector3.zero;
        controlSignal.x = vectorAction[0];
        controlSignal.z = vectorAction[1];
        rBody.AddForce(controlSignal * speed);

    
        //Reward
        //Get the distance to the box (target) from the distance the ball (agent) moved
        float distanceToTarget = Vector3.Distance(this.transform.position, 
                                                 Target.position);

        //When the box (target) is reached
        if (distanceToTarget < 1.42f)
        {
            //Rewarded and completed
            SetReward(1.0f);
            Done();
        }

        //If you fall off the floor
        if (this.transform.position.y < 0)
        {
            Done();
        }
    }
}

Finish on the Unity editor

-Select RollerAgent in the * Hierarchy * window and change theRollerAgent (Script)item by two points. Decision Interval = 10 Target = Target(Transform) sct.png

-Add * Add Component> Behavior Parameters * and change the settings as follows.

Behavior Name = RollerBallBrain Vector Observation Space Size = 8 Vector Action Space Type = Continuous Vector Action Space Size = 2

Be.png

Also, according to the Official Documentation (https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Create-New.md), if you continue to use the default parameters 30 It seems that it takes time to learn 10,000 steps. This time it's not that complicated, so let's rewrite some of the parameters to reduce the number of trials to less than 20,000 steps.

-Open trainer_config.yaml in * ml-agents-master-0.11> config> * with an editor (VS code or Notepad) and rewrite the values of the following items.

aaaa.png

batch_size: 10
buffer_size: 100

Now you are ready to train.

Manual test

It's almost time to get here. Before reinforcement learning, let's manually check whether the environment created so far works properly. Implement the following method additionally in the class of RollerAgent.cs.

public override float[] Heuristic()
{
    var action = new float[2];
    action[0] = Input.GetAxis("Horizontal");
    action[1] = Input.GetAxis("Vertical");
    return action;
}

Horizontal (horizontal) input axis with Horizontal, Allows Vertical to accept vertical (vertical) input axes.

You can now use the "W", "A", "S", "D" or arrow keys.

Finally, in the Roller Agent * Inspector *, Select the * Use Heuristic * check box under * Behavior Parameters *.

he.png

Press Play to run it. If you can confirm that it works by key input, it is successful.

Learn with TensorFlow

Now, let's move on to the learning step.

Environment construction / library installation

First, launch Anaconda Prompt. You can find it immediately by searching from the start menu (Win key). an.png

conda create -n ml-agents python=3.6

Enter to build a virtual environment. [^ 1] on.png

Proceed([y]/n)?

You will be asked if you want to install it, so enter y. continue,

activate ml-agents

Enter to move to the virtual environment. [^ 2] Make sure you have (ml-agents) at the beginning of the command line. aaaa.png

cd <ml-agent folder >

Go to. [^ 3]

pip install mlagents

Install the library that ML-Agents uses independently. (It takes a few minutes) This installation makes dependencies such as TensorFlow / Jupyter.

After a while, It is OK if a screen like this appears. wewe.png

cd <ml-agents folder >\ml-agents-envs

Go to.

pip install -e .

To install the package. konnna.png It is OK if the screen looks like this. And

cd <ml-agents folder >\ml-agents

Go to.

pip install -e .

To install the package. www.png

This completes the preparation on the Python side.

__ : collision: [Note]: The TensorFlowSharp plugin is not used in v0.6.x or later. __ If you have been referring to old books, we recommend that you recreate a new virtual environment.

Until ML-Agents ver0.5.0, TensorFlowSharp was used to communicate with Python, but please do not use it in the latest version. If you use it, the following error will occur.

No model was present for the Brain 3DBallLearning. UnityEngine.Debug:LogError(Object) MLAgents.LearningBrain:DecideAction() (at Assets/ML-Agents/Scripts/LearningBrain.cs:191) MLAgents.Brain:BrainDecideAction() (at Assets/ML-Agents/Scripts/Brain.cs:80) MLAgents.Academy:EnvironmentStep() (at Assets/ML-Agents/Scripts/Academy.cs:601) MLAgents.Academy:FixedUpdate() (at Assets/ML-Agents/Scripts/Academy.cs:627)

(Source)


Reinforcement Learning

Well, finally we will start learning. The dream AI experience is just around the corner. let's do our best.

cd <ml-agents> folder

Enter to move to the downloaded folder hierarchy.

mlagents-learn config/trainer_config.yaml --run-id=firstRun --train

To execute. [^ 4] aaaaw.png At the bottom of the command line, __INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor. (Go back to the Unity editor and press the Play button to start training.) __ Is displayed.

Go back to the Unity screen and uncheck * Use Heuristic * in __ * Behavior Parameters * and press the __,: arrow_forward: button.

When the ball started chasing the box, learning started normally.

__ If you do not press the Play button for a while, a timeout error will occur, so please execute the same command again. __

The log is output to the console log every 1000 steps. If you want to interrupt in the middle, you can interrupt with Ctrl + C. (If you dare to finish early, you can make a "weak AI") おー!!!!.png

__Step is the number of trials (learning), __ __Mean Reward earned average reward, __ __Std of Reward is standard deviation __ (value representing data variability) Represents.

After learning, the RollerBallBrain.nn file will be created under<ml-agents folder> \ models \ <id name ~>.

hyhy.png

Learning reflection

Now it's time to try out the model of the generated neural network.

Copy the RollerBallBrain.nn file from earlier to the * Assets * folder in Unity's Project. (The location can be anywhere in the project) wwqqqq.png

Then click the: radio_button: button on the far right of the * Model * item in the * Inspector * of the RollerAgent and select the imported .nn file. (* At this time, be careful not to confuse if there is a .nn extension file with the same name.)

Also, if * Use Heuristic * in * Behavior Parameters * is left checked, it will not work properly. __ Be sure to uncheck it after the test. __ aaaqqqq.png

Now let's press: arrow_forward: Play.

__ If the ball starts chasing you safely, you are successful. __ 30fps.gif

(Bonus) Observe the transition graph with TensorBoard

In Anaconda Prompt, do the following:

tensorboard --logdir=summaries --port=6006

If you open [localhost: 6006](http: // localhost: 6006 /) in your browser, you can see the transition of learning in a graph. ほほう、TensorBoardですか。(ニチャア・・・.png

Summary

――If you can read more Gorigori C #, you will be able to fine-tune the algorithm yourself __ ――In reinforcement learning, __AI's wisdom can be classified into weak, medium, strong, etc. by the number of learnings __ --Ver is frequently renewed, __ information is apt to deteriorate __ ―― ~~ Learning is much faster than humans. The power of science is amazing! !! ~~

Even beginners can use assets to create a convenient world where simple machine learning can be imitated in a day. How was it when you actually touched it? I hope it will give you an opportunity to become interested in machine learning.

If you find any expressions or errors that you are interested in, I would appreciate it if you could point them out. Also, if you found this article helpful, I like it! It will be __encouragement if you give me.

Thank you for your cooperation.

reference

Below are articles from our ancestors who have been very helpful in learning. I would like to take this opportunity to say __Acknowledgment __.

Unity-Technologies Official Document (GitHub) ml-agents Migration Guide (GitHub) [Unity: How to use ML-Agents in September 2019 (ver0.9.0 /0.9.1/0.9.2)](https://www.fast-system.jp/unity-ml-agents-version-0- 9-0-how to /) [Unity] I tried a tutorial on reinforcement learning (ML-Agents v0.8.1) [Create a new learning environment with Unity's ML-Agents (0.6.0a version)](http://am1tanaka.hatenablog.com/entry/2019/01/18/212915#%E5%AD%A6%E7 % BF% 92% E5% 8A% B9% E6% 9E% 9C% E3% 82% 92% E9% AB% 98% E3% 82% 81% E3% 82% 8B% E3% 81% 8A% E3% 81 % BE% E3% 81% 91)

[^ 1]: * You can change the "ml-agents" * part to any name you like. [^ 2]: Activate with the virtual environment name you set [^ 3]: Directory where * ml-agents-master * was downloaded in Preparation [^ 4]: * You can change the part of "firstRun" * to any name you like.

Recommended Posts