This repo provides the full implementation for the paper "Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming" at the International Conference on Learning Representations (ICLR) 2022
Authors: Sachin Konan*, Esmaeil Seraj*, Matthew Gombolay
* Co-first authors. These authors contributed equally to this work.
Full Read (arXiv): https://arxiv.org/pdf/2201.08484.pdf
- Download Anaconda
conda env create --file marl.yml
cd PettingZoo
conda activate marl
python setup.py install
- Follow Starcraft MultiAgent Challenge Instructions Here:
https://github.com/oxwhirl/smac
cd pistonball
- To Execute Experiments:
- MOA:
python test_piston_ball.py -method moa
- InfoPG:
python test_piston_ball.py -method infopg -k [K_LEVELS]
- Adv. InfoPG:
python test_piston_ball.py -method infopg_adv -k [K_LEVELS]
- Consensus Update:
python test_piston_ball.py -method consensus
- Standard A2C:
python test_piston_ball.py -method a2c
- MOA:
- To Execute PR2-AC Experiments:
- cd
../pr2-ac/pistonball/
python distributed_pistonabll_train.py -batch 4 -workers [NUM CPUS]
- Results will be saved in
experiments/pistonball/[DATETIME OF RUN]/
- cd
- MOA:
python batch_pistoncase_moa_env.py
- InfoPG:
python batch_pistoncase_infopg_env.py
cd pong
- To Execute MOA Experiments:
cd pong_moa
- MOA:
python distributed_pong_moa_train.py -batch 16 -workers [NUM CPUS]
- Results will be saved in
experiments/pong/[DATETIME OF RUN]/
- To Execute PR2-AC Experiments:
- cd
../pr2-ac/pong/
python distributed_pong_train.py -batch 16 -workers [NUM CPUS]
- Results will be saved in
experiments/pong/[DATETIME OF RUN]/
- cd
- To Execute Other Experiments:
- InfoPG:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic
- Adv. InfoPG:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv normal
- Consensus Update:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal -consensus
- Standard A2C:
python distributed_pong_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal
- Results will be saved in
experiments/pong/[DATETIME OF RUN]/
- InfoPG:
cd walker
- To Execute MOA Experiments:
cd walker_moa
- MOA:
python distributed_walker_train_moa.py -batch 16 -workers [NUM CPUS]
- Results will be saved in
experiments/walker_moa/[DATETIME OF RUN]/
- To Execute PR2-AC Experiments:
- cd
../pr2-ac/walker/
python distributed_walker_train.py -batch 16 -workers [NUM CPUS]
- Results will be saved in
experiments/walker/[DATETIME OF RUN]/
- cd
- To Execute Other Experiments:
- InfoPG:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic
- Adv. InfoPG:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k [K_LEVELS] -adv normal
- Consensus Update:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal -consensus
- Standard A2C:
python distributed_walker_train.py -batch 16 -workers [NUM CPUS] -k 0 -adv normal
- Results will be saved in
experiments/walker/[DATETIME OF RUN]/
- InfoPG:
cd starcraft
- To Execute MOA Experiments:
cd moa
- MOA:
python distributed_starcraft_train_moa.py -batch 128 -workers [NUM CPUS] -positive_rewards
- Results will be saved in
experiments/starcraft/[DATETIME OF RUN]/
- To Execute PR2-AC Experiments:
- cd
../pr2-ac/starcraft/
python distributed_starcraft_train.py -batch 128 -workers [NUM CPUS]
- Results will be saved in
experiments/starcraft/[DATETIME OF RUN]/
- cd
- To Execute Other Experiments:
- InfoPG:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k [K_LEVELS] -adv info -critic -positive_rewards
- Adv. InfoPG:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k [K_LEVELS] -adv normal -positive_rewards
- Consensus Update:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k 0 -adv normal -consensus -positive_rewards
- Standard A2C:
python distributed_walker_train.py -batch 128 -workers [NUM CPUS] -k 0 -adv normal -positive_rewards
- Results will be saved in
experiments/starcraft/[DATETIME OF RUN]/
- InfoPG: