-
Install Unity Hub along with Unity 2018.4.24f1 (LTS) or higher.
-
Install Python 3.6.1 or higher (Anaconda installation is preferred as we'll be creating a virtual environment shortly).
-
Install ML-Agents Unity Package by cloning the latest stable release of Unity ML-Agents Toolkit:
$ git clone --branch release_6 https://github.com/Unity-Technologies/ml-agents.git
Note: For details regarding installation of Unity ML-Agents, please consult the official installation guide.
-
Install ML-Agents Python Package (tested version:
mlagents 0.19.0
):
-
Create a virtual environment (strongly recommended):
$ conda create --name ML-Agents python=3.7
-
Activate the environment:
$ conda activate ML-Agents
-
Install
mlagents
package from PyPi (this command also installs the required dependencies):$ pip3 install mlagents
- Setup the MARL Simulator:
-
Navigate to the Unity ML-Agents Repository directory:
$ cd <path/to/unity-ml-agents/repository>
-
Clone this repository:
$ git clone https://github.com/Tinker-Twins/MARL-Simulator.git
-
Launch Unity Hub and select
ADD
project button. -
Navigate to the Unity ML-Agents Repository directory and select the parent folder of this repository
MARL-Simulator
.
Every agent
needs a script inherited from the Agent
class. Following are some of the useful methods:
-
public override void Initialize()
Initializes the environment. Similar to
void Start()
. -
public override void CollectObservations(VectorSensor sensor)
Collects observations. Use
sensor.AddObservation(xyz)
to add observation "xyz". -
public override void OnActionReceived(float[] vectorAction)
Define the actions to be performed using the passed
vectorAction
. Reward function is also defined here. You can useif
-else
cases to define rewards/penalties. Don't forget to callEndEpisode()
to indicate end of episode. -
public override void OnEpisodeBegin()
This is called when
EndEpisode()
is called. Define your "reset" algorithm here before starting the next episode. -
public override void Heuristic(float[] actionsOut)
Use
actionsOut[i]
to define manual controls duringHeuristic Only
behaviour.
Attach this script to the agent along with BehaviourParameters
and DecisionRequester
scripts inbuilt with the ML-Agents Unity Package (just search their names in Add Component
dropdown menu of the agent gameobject).
After defining your logic, test the functionality by selecting Heuristic Only
in the Behaviour Type
of the BehaviourParameters
attached to the agent.
-
Create a configuration file (
<config>.yaml
) to define training parameters. For details, refer the official training configuration guide.Note: Two configuration files are provided:
C-MARL.yaml
andNC-MARL.yaml
for cooperative and non-cooperative multi-agent motion planning, respectively. -
Within the
BehaviourParameters
script attached to the agent, give a uniqueBehaviour Name
for training purpose. -
Activate the
ML-Agents
environment:$ conda activate ML-Agents
-
Navigate to the Unity ML-Agents Repository directory:
$ cd <path/to/unity-ml-agents/repository>
-
Start the training.
$ mlagents-learn <path/to/config>.yaml --run-id=<Run1>
-
Hit the
Play
button in Unity Editor to "actually" start the training.
-
Navigate to the Unity ML-Agents Repository directory:
$ cd <path/to/unity-ml-agents/repository>
-
Launch TensorBoard to analyze the training results:
$ tensorboard --logdir results
-
Open browser application (tested with Google Chrome) and log on to http://localhost:6006 to view the training results.
-
Navigate to the Unity ML-Agents Repository directory and locate a folder called
results
. -
Open the
results
folder and locate a folder named after the<training_behaviour_name>
that you used while training the agent(s). -
Copy the saved neural network models (the
*.nn
files) into theTF Models
folder of theMARL Simulator
Unity Project. -
In the inspector window, attach respective NN model(s) to the
Model
variable in theBehaviourParameters
script attached to the agent(s). -
Select
Inference Only
in theBehaviour Type
of theBehaviourParameters
attached to the agent(s). -
Hit the play button in Unity Editor and watch your agent(s) play!
-
Craft the reward function carefully; agents cheat a lot!
-
Tune the training parameters in
<config>
.yaml file. -
As long as possible, duplicate the training arenas within the scene to ensure parallel (faster) training.
Note: Make sure to commit changes (if any) to all the duplicates as well!
Implementation demonstrations are available on YouTube.
Please cite the following paper when using the MARL Simulator for your research:
@inproceedings{MARL-2020,
author = {Sivanathan, K. and Vinayagam, B. K. and Samak, Tanmay and Samak, Chinmay},
booktitle = {2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)},
title = {Decentralized Motion Planning for Multi-Robot Navigation using Deep Reinforcement Learning},
year = {2020},
pages = {709-716},
doi = {10.1109/ICISS49785.2020.9316033},
url = {https://doi.org/10.1109/ICISS49785.2020.9316033}
}
This work has been published in 2020 International Conference on Intelligent Sustainable Systems (ICISS). The publication can be found on IEEE Xplore.