/hierarchical-marl

Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery

Primary LanguagePythonOtherNOASSERTION

Hierarchial Cooperative Multi-Agent Reinforcement Learning with Skill Discovery (HSD)

This is the code for experiments in the paper Hierarchial Cooperative Multi-Agent Reinforcement Learning with Skill Discovery, published in AAMAS 2020. Ablations and baselines are included. The STS2 game will be released in a separate repository in the near future.

Prerequisites

  • Python version >= 3.5.2
  • TensorFlow 1.13.1
  • PyGame 1.9.4

Project structure

  • alg : implementation of algorithm, neural networks, config.json containing all hyperparameters.
  • env : implementation of multi-agent wrapper around STS2 simulator.
  • results : each experiment will create a subfolder that contains log files recorded during training and eval.
  • test : test scripts

Training

Each algorithm named alg_*.py is run through a script with name train_*.py. The pairings are as follows:

  • train_hsd.py runs alg_hsd.py (HSD)
  • train_offpolicy.py runs alg_qmix.py (QMIX) and alg_iql.py (IQL)
  • train_hsd_scripted.py runs alg_hsd_scripted.py

To do multi-seed runs that sweep over the initial random seed, set appropriate choices in config.json and use train_multiprocess.py. See example below.

For all algorithms,

  • Activate your TensorFlow (if using virtualenv) and allocate GPU using export CUDA_VISIBLE_DEVICES=<n> where n is some GPU number.
  • cd into the alg folder
  • Execute training script, e.g. python train_hsd.py
  • Periodic training progress is logged in log.csv, along with saved models, under results/<dir_name>.

Example 1: training HSD

  • Select correct settings in alg/config.json. Refer to config_hsd.json for an example. The key parameters to set are
    • "alg_name" : "hsd"
    • everything under "h_params"
    • neural network parameters under "nn_hsd"

Example 2: training QMIX

  • Select correct settings in alg/config.json. Refer to config_qmix.json for an example. The key parameters to set are
    • "alg_name" : "qmix"
    • neural network parameters under "nn_qmix"

Example 3 for multi-seed runs

For example, to conduct 5 parallel runs with seeds 12341,12342,...,12345 and save into directory names hsd_1, hsd_2,...,hsd_3 (all under results/), set the following parameters in config.json:

  • "N_seeds" : 5
  • "seed" : 12341
  • "dir_name" : "hsd"
  • "dir_idx_start" : 1

Testing

Example 1 for testing HSD

  • Choose appropriate settings in alg/config.json.

    • "dir_name" : "hsd_1"
    • "model_name" : "model_good.ckpt-<some number>"
    • "render" : true (to see PyGame)
    • "N_test" : 100 (for 100 test episodes)
    • "measure" : true (to enable generation of additional .csv files for analysis of behavior)
  • cd into the alg folder. Execute test script python test.py

  • Results will be stored in test.csv under results/<dir_name>/. If "measure" : true, then files matrix_role_counts.pkl, count_skills.pkl and count_low_actions.pkl will also be generated.

Citation

@article{yang2019hierarchical,
  title={Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery},
  author={Yang, Jiachen and Borovikov, Igor and Zha, Hongyuan},
  journal={arXiv preprint arXiv:1912.03558},
  year={2019}
}

License

HSD is distributed under the terms of the BSD-3 license. All new contributions must be made under this license.

See LICENSE for details.

SPDX-License-Identifier: BSD-3