/pymarl2

Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning.

Primary LanguagePythonApache License 2.0Apache-2.0

RIIT

Open-source code for Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning. Our goal is to call for a fair comparison of the performance of MARL algorithms.

Code-level Optimizations

There are so many code-level tricks in the Multi-agent Reinforcement Learning (MARL), such as:

  • Value function clipping (clip max Q values for QMIX)
  • Value Normalization
  • Reward scaling
  • Orthogonal initialization and layer scaling
  • Adam
  • learning rate annealing
  • Reward Clipping
  • Observation Normalization
  • Gradient Clipping
  • Large Batch Size
  • N-step Returns(including GAE($\lambda$) and Q($\lambda$))
  • Rollout Process Number
  • $\epsilon$-greedy annealing steps
  • Death Agent Masking

Related Works

  • Implementation Matters in Deep RL: A Case Study on PPO and TRPO
  • What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
  • The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games

Finetuned-QMIX

Using a few of tricks above (Bold texts), we enabled QMIX to solve almost all of SMAC's scenarios.

Senarios Difficulty QMIX (batch_size=128) OurQMIX
8m Easy - 100%
2c_vs_1sc Easy - 100%
2s3z Easy - 100%
1c3s5z Easy - 100%
3s5z Easy - 100%
8m_vs_9m Hard 84% 100%
5m_vs_6m Hard 84% 90%
3s_vs_5z Hard 96% 100%
bane_vs_bane Hard 100% 100%
2c_vs_64zg Hard 100% 100%
corridor Super Hard 0% 100%
MMM2 Super Hard 98% 100%
3s5z_vs_3s6z Super Hard 3% 85%(Number of Envs = 4)
27m_vs_30m Super Hard 56% 100%
6h_vs_8z Super Hard 0% 93%($\lambda$ = 0.3)

Re-Evaluation

Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a genaral set of hyperparameters), and find that QMIX achieves the SOTA.

Algo. Type 3s_vs_5z 5m_vs_6m 3s5z_vs_3s6z corridor 6h_vs_8z MMM2 Predator-Prey
OurQMIX VB 100% 90% 75% 100% 84% 100% 40
OurVDNs VB 100% 90% 43% 98% 87% 96% 39
OurQatten VB 100% 90% 62% 100% 68% 100% -
OurQPLEX VB 100% 90% 68% 96% 78% 100% 39
OurWQMIX VB 100% 90% 6% 96% 78% 23% 39
OurLICA PG 3% 53% 0% 0% 4% 0% 30
OurDOP PG 0% 9% 0% 0% 1% 0% 32
RIIT PG 96% 67% 75% 100% 19% 100% 38

PyMARL

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
bash install_dependecies.sh

Set up StarCraft II and SMAC:

bash install_sc2.sh

This will download SC2 into the 3rdparty folder and copy the maps necessary to run over.

Command Line Tool

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
# For Cooperative Predator-Prey
python3 src/main.py --config=qmix_prey --env-config=stag_hunt with env_args.map_name=stag_hunt

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run n parallel experiments

# bash run.sh config_name map_name_list (threads_num arg_list gpu_list experinments_num)
bash run.sh qmix corridor 2 epsilon_anneal_time=500000 0,1 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Kill all training processes

# all python and game processes of current user will quit.
bash clean.sh

Cite

@article{hu2021rethinking,
      title={Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Siyang Jiang and Seth Austin Harding and Haibin Wu and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}