/IC3Net

Code for ICLR 2019 paper: Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Primary LanguagePythonMIT LicenseMIT

IC3Net

This repository contains reference implementation for IC3Net paper (accepted to ICLR 2019), Learning when to communicate at scale in multiagent cooperative and competitive tasks, available at https://arxiv.org/abs/1812.09755

Cite

If you use this code or IC3Net in your work, please cite the following:

@article{singh2018learning,
  title={Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks},
  author={Singh, Amanpreet and Jain, Tushar and Sukhbaatar, Sainbayar},
  journal={arXiv preprint arXiv:1812.09755},
  year={2018}
}

Standalone environment version

Installation

First, clone the repo and install ic3net-envs which contains implementation for Predator-Prey and Traffic-Junction

git clone https://github.com/IC3Net/IC3Net
cd IC3Net/ic3net-envs
python setup.py develop
pip install tensorboardX

Optional: If you want to run experiments on StarCraft, install the gym-starcraft package included in this package. Follow the instructions provided in README inside that packages.

Next, we need to install dependencies for IC3Net including PyTorch. For doing that run:

pip install -r requirements.txt

Running

Some example scripts have been provided for this. The code has been changed a bit from the original repo of IC3 Net to support 2 things (1). Agents can now use discrete learnable prototypes for communication (2). Agents can now train using a gating penalty(if specified) which enables agents to learn sparse communication protocols even in fully cooperative scenarios.

Also, the repo now uses tensorboard instead of visdom which can be used to view training plots.

For discrete communication try out the script:

python run_pp_proto.py

I would recommend going through the script once to better understand the arguments. This script learns discrete communication protocol with fixed gating head(g=1). After training, it also plots the graphs for rewards, success rates and communication rates.

Similarly, for trying out the gating penalty approach use

python run_g0.01.py

Similarity you can write training scripts for other environments. I am also including one for the traffic-junction environment.

Contributors

License

Code is available under MIT license.