We propose CollaQ, a novel way to decompose Q function for decentralized policy in multi-agent modeling. In StarCraft II Multi-Agent Challenge, CollaQ outperforms existing state-of-the-art techniques (i.e., QMIX, QTRAN, and VDN) by improving the win rate by 40% with the same number of samples. In the more challenging ad hoc team play setting (i.e., reweight/add/remove units without re-training or finetuning), CollaQ outperforms previous SoTA by over 30%.
Please check our website for comprehensive results.
3-min video for paper introduction.
Please cite our arXiv paper if you use this codebase:
@article{zhang2020multi,
title={Multi-Agent Collaboration via Reward Attribution Decomposition},
author={Zhang, Tianjun and Xu, Huazhe and Wang, Xiaolong and Wu, Yi and Keutzer, Kurt and Gonzalez, Joseph E and Tian, Yuandong},
journal={arXiv preprint arXiv:2010.08531},
year={2020}
}
Note: We are using SC2.4.6.2 and all baselines are run by ourselves using this version of SC2.
git clone git@github.com:facebookresearch/CollaQ.git
The requirements.txt file can be used to install the necessary packages into a virtual environment with python == 3.6.0 (not recomended).
Install smac and sacred:
git submodule sync && git submodule update --init --recursive
cd third_party/sacred
git apply ../sacred.patch
cd ../smac
git apply ../smac.patch
cd ../pymarl
git apply ../pymarl.patch
Building up src folder for code
cd ../..
cp -r third_party/pymarl/src .
cp src_code/config/* src/config/algs/
cp src_code/controllers/* src/controllers/
cp src_code/learners/* src/learners/
cp src_code/modules/* src/modules/agents/
SC2PATH=.../pymarl/StarCraftII
python src/main.py --config=qmix --env-config=sc2 with env_args.map_name=MMM2,
python src/main.py --config=qmix_interactive_reg --env-config=sc2 with env_args.map_name=MMM2,
python src/main.py --config=qmix_interactive_reg_attn --env-config=sc2 with env_args.map_name=MMM2,
python src/main.py --config=qmix_interactive_reg_attn --env-config=sc2 with env_args.map_name=29m_vs_30m,28m_vs_30m, obs_agent_id=False
python src/main.py --config=qmix_interactive_reg_attn --env-config=sc2 with env_args.map_name=27m_vs_30m,28m_vs_30m, obs_agent_id=False
python src/main.py --config=qmix_interactive_reg_attn --env-config=sc2 with env_args.map_name=3s1z_vs_zg_easy, 1s3z_vs_zg_easy,2s2z_vs_zg_easy, obs_agent_id=False
They are all located in src/config
.
--config
refers to the config files in src/config/algs
--env-config
refers to the config files in src/config/envs
All results will be stored in the Results
folder.
python src/main.py --config=qmix --env-config=sc2 with env_args.map_name=5m_vs_6m, evaluate=True checkpoint_path=results/models/5m_vs_6m/... save_replay=True
You can save the learnt models to disk by setting save_model = True
, which is set to False
by default. The frequency of saving models can be adjusted using save_model_interval
configuration. Models will be saved in the result directory, under the folder called models. The directory corresponding each run will contain models saved throughout the experiment, each within a folder corresponding to the number of timesteps passed since starting the learning process.
Learnt models can be loaded using the checkpoint_path
parameter, after which the learning will proceed from the corresponding timestep.
save_replay
option allows saving replays of models which are loaded using checkpoint_path
. Once the model is successfully loaded, test_nepisode
number of episodes are run on the test mode and a .SC2Replay file is saved in the Replay directory of StarCraft II. Please make sure to use the episode runner if you wish to save a replay, i.e., runner=episode
. The name of the saved replay file starts with the given env_args.save_replay_prefix
(map_name if empty), followed by the current timestamp.
The saved replays can be watched by double-clicking on them or using the following command:
python -m pysc2.bin.play --norender --rgb_minimap_size 0 --replay NAME.SC2Replay
Note: Replays cannot be watched using the Linux version of StarCraft II. Please use either the Mac or Windows version of the StarCraft II client.
Our vanilla RL algorithm is based on PyMARL, which is an open source implementation of algorithms in StarCraft II.
This code is under the CC-BY-NC 4.0 (Attribution-NonCommercial 4.0 International) license.