LXXXXR/I2C

Python

Installation

Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.14.0), numpy (1.18.2)

Environment options

--scenario: defines which environment in the MPE is to be used (default: "cn")
--max-episode-len maximum length of each episode for the environment (default: 25)
--num-episodes total number of training episodes (default: 60000)
--num-adversaries: number of adversaries in the environment (default: 0)

Core training parameters

--lr: learning rate (default: 1e-2)
--gamma: discount factor (default: 0.95)
--batch-size: batch size (default: 800)
--num-units: number of units in the MLP (default: 128)

Training for prior network

--prior-buffer-size: prior network training buffer size
--prior-num-iter: prior network training iterations
--prior-training-rate: prior network training rate
--prior-training-percentile: control threshold for KL value to get labels

Checkpointing

--exp-name: name of the experiment, used as the file name to save all results (default: None)
--save-dir: directory where intermediate training results and model will be saved (default: "/tmp/policy/")
--save-rate: model is saved every time this number of episodes has been completed (default: 1000)
--load-dir: directory where training state and model are loaded from (default: "")
--plots-dir: directory where training curves are saved (default: "./learning_curves/")
--restore_all: whether to restore existing I2C network

Training procedure

I2C be learned end-to-end or in a two-phase manner. This code is implemented for end-to-end manner which could take more training time compared with the latter manner

For Cooperative Navigation, python3 train.py --scenario 'cn' --prior-training-percentile 60 --lr 1e-2

For Predator Prey, python3 train.py --scenario 'pp' --prior-training-percentile 40 --lr 1e-3

Citations

If you are using the codes, please cite our paper.

Ziluo Ding, Tiejun Huang, and Zongqing Lu. Learning Individually Inferred Communication for Multi-Agent Cooperation. NeurIPS'20.

@inproceedings{ding2020learning,
    	title={Learning Individually Inferred Communication for Multi-Agent Cooperation},
    	author={Ding, Ziluo and Huang, Tiejun and Lu, Zongqing},
    	booktitle={NeurIPS},
    	year={2020}
}

Acknowledgements

This code is developed based on the source code of MADDPG by Ryan Lowe