MARL Simulator

A Simulation System for Implementing and Testing Multi-Agent Reinforcement Learning Algorithms

SETUP

Install Unity Hub along with Unity 2018.4.24f1 (LTS) or higher.
Install Python 3.6.1 or higher (Anaconda installation is preferred as we'll be creating a virtual environment shortly).
Install ML-Agents Unity Package by cloning the latest stable release of Unity ML-Agents Toolkit:
```
$ git clone --branch release_6 https://github.com/Unity-Technologies/ml-agents.git
```
Note: For details regarding installation of Unity ML-Agents, please consult the official installation guide.
Install ML-Agents Python Package (tested version: mlagents 0.19.0):

Create a virtual environment (strongly recommended):
```
$ conda create --name ML-Agents python=3.7
```
Activate the environment:
```
$ conda activate ML-Agents
```
Install mlagents package from PyPi (this command also installs the required dependencies):
```
$ pip3 install mlagents
```

Setup the MARL Simulator:

Navigate to the Unity ML-Agents Repository directory:
```
$ cd <path/to/unity-ml-agents/repository>
```

Clone this repository:

$ git clone https://github.com/Tinker-Twins/MARL-Simulator.git

Launch Unity Hub and select ADD project button.
Navigate to the Unity ML-Agents Repository directory and select the parent folder of this repository MARL-Simulator.

USAGE

Programming

Every agent needs a script inherited from the Agent class. Following are some of the useful methods:

public override void Initialize()

Initializes the environment. Similar to void Start().
public override void CollectObservations(VectorSensor sensor)

Collects observations. Use sensor.AddObservation(xyz) to add observation "xyz".
public override void OnActionReceived(float[] vectorAction)

Define the actions to be performed using the passed vectorAction. Reward function is also defined here. You can use if-else cases to define rewards/penalties. Don't forget to call EndEpisode() to indicate end of episode.
public override void OnEpisodeBegin()

This is called when EndEpisode() is called. Define your "reset" algorithm here before starting the next episode.
public override void Heuristic(float[] actionsOut)

Use actionsOut[i] to define manual controls during Heuristic Only behaviour.

Attach this script to the agent along with BehaviourParameters and DecisionRequester scripts inbuilt with the ML-Agents Unity Package (just search their names in Add Component dropdown menu of the agent gameobject).

Debugging

After defining your logic, test the functionality by selecting Heuristic Only in the Behaviour Type of the BehaviourParameters attached to the agent.

Training

Create a configuration file (<config>.yaml) to define training parameters. For details, refer the official training configuration guide.

Note: Two configuration files are provided: C-MARL.yaml and NC-MARL.yaml for cooperative and non-cooperative multi-agent motion planning, respectively.
Within the BehaviourParameters script attached to the agent, give a unique Behaviour Name for training purpose.
Activate the ML-Agents environment:
```
$ conda activate ML-Agents
```
Navigate to the Unity ML-Agents Repository directory:
```
 $ cd <path/to/unity-ml-agents/repository>
```

Start the training.

$ mlagents-learn <path/to/config>.yaml --run-id=<Run1>

Hit the Play button in Unity Editor to "actually" start the training.

Training Analysis

Navigate to the Unity ML-Agents Repository directory:
```
 $ cd <path/to/unity-ml-agents/repository>
```
Launch TensorBoard to analyze the training results:
```
$ tensorboard --logdir results
```
Open browser application (tested with Google Chrome) and log on to http://localhost:6006 to view the training results.

Deployment

Navigate to the Unity ML-Agents Repository directory and locate a folder called results.
Open the results folder and locate a folder named after the <training_behaviour_name> that you used while training the agent(s).
Copy the saved neural network models (the *.nn files) into the TF Models folder of the MARL Simulator Unity Project.
In the inspector window, attach respective NN model(s) to the Model variable in the BehaviourParameters script attached to the agent(s).
Select Inference Only in the Behaviour Type of the BehaviourParameters attached to the agent(s).
Hit the play button in Unity Editor and watch your agent(s) play!

IMPORTANT TIPS

Craft the reward function carefully; agents cheat a lot!
Tune the training parameters in <config>.yaml file.
As long as possible, duplicate the training arenas within the scene to ensure parallel (faster) training.

Note: Make sure to commit changes (if any) to all the duplicates as well!

DEMO

Implementation demonstrations are available on YouTube.

CITATION

Please cite the following paper when using the MARL Simulator for your research:

@inproceedings{MARL-2020,
author = {Sivanathan, K. and Vinayagam, B. K. and Samak, Tanmay and Samak, Chinmay},
booktitle = {2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)},
title = {Decentralized Motion Planning for Multi-Robot Navigation using Deep Reinforcement Learning},
year = {2020},
pages = {709-716},
doi = {10.1109/ICISS49785.2020.9316033},
url = {https://doi.org/10.1109/ICISS49785.2020.9316033}
}

This work has been published in 2020 International Conference on Intelligent Sustainable Systems (ICISS). The publication can be found on IEEE Xplore.