This is the code for experiments in the paper Hierarchial Cooperative Multi-Agent Reinforcement Learning with Skill Discovery, published in AAMAS 2020. Ablations and baselines are included. The STS2 game will be released in a separate repository in the near future.
- Python version >= 3.5.2
- TensorFlow 1.13.1
- PyGame 1.9.4
alg
: implementation of algorithm, neural networks,config.json
containing all hyperparameters.env
: implementation of multi-agent wrapper around STS2 simulator.results
: each experiment will create a subfolder that contains log files recorded during training and eval.test
: test scripts
Each algorithm named alg_*.py
is run through a script with name train_*.py
.
The pairings are as follows:
train_hsd.py
runsalg_hsd.py
(HSD)train_offpolicy.py
runsalg_qmix.py
(QMIX) andalg_iql.py
(IQL)train_hsd_scripted.py
runsalg_hsd_scripted.py
To do multi-seed runs that sweep over the initial random seed, set appropriate choices in config.json and use train_multiprocess.py
. See example below.
For all algorithms,
- Activate your TensorFlow (if using
virtualenv
) and allocate GPU usingexport CUDA_VISIBLE_DEVICES=<n>
wheren
is some GPU number. cd
into thealg
folder- Execute training script, e.g.
python train_hsd.py
- Periodic training progress is logged in
log.csv
, along with saved models, underresults/<dir_name>
.
- Select correct settings in
alg/config.json
. Refer toconfig_hsd.json
for an example. The key parameters to set are"alg_name" : "hsd"
- everything under
"h_params"
- neural network parameters under
"nn_hsd"
- Select correct settings in
alg/config.json
. Refer toconfig_qmix.json
for an example. The key parameters to set are"alg_name" : "qmix"
- neural network parameters under
"nn_qmix"
For example, to conduct 5 parallel runs with seeds 12341,12342,...,12345 and save into directory names hsd_1, hsd_2,...,hsd_3 (all under results/
), set the following parameters in config.json:
"N_seeds" : 5
"seed" : 12341
"dir_name" : "hsd"
"dir_idx_start" : 1
-
Choose appropriate settings in
alg/config.json
."dir_name" : "hsd_1"
"model_name" : "model_good.ckpt-<some number>"
"render" : true
(to see PyGame)"N_test" : 100
(for 100 test episodes)"measure" : true
(to enable generation of additional .csv files for analysis of behavior)
-
cd
into thealg
folder. Execute test scriptpython test.py
-
Results will be stored in
test.csv
underresults/<dir_name>/
. If"measure" : true
, then filesmatrix_role_counts.pkl
,count_skills.pkl
andcount_low_actions.pkl
will also be generated.
@article{yang2019hierarchical, title={Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery}, author={Yang, Jiachen and Borovikov, Igor and Zha, Hongyuan}, journal={arXiv preprint arXiv:1912.03558}, year={2019} }
HSD is distributed under the terms of the BSD-3 license. All new contributions must be made under this license.
See LICENSE for details.
SPDX-License-Identifier: BSD-3