StarCraft
Pytorch implementations of the multi-agent reinforcement learning algorithms, including IQL, QMIX, VDN, COMA, QTRAN(both QTRAN-base and QTRAN-alt), MAVEN, CommNet, DyMA-CL, and G2ANet, which are the state of the art MARL algorithms. In addition, because CommNet and G2ANet need an external training algorithm, we provide Central-V and REINFORCE for them to training, you can also combine them with COMA. We trained these algorithms on SMAC, the decentralised micromanagement scenario of StarCraft II.
Corresponding Papers
- IQL: Independent Q-Learning
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- Value-Decomposition Networks For Cooperative Multi-Agent Learning
- Counterfactual Multi-Agent Policy Gradients
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
- Learning Multiagent Communication with Backpropagation
- From Few to More: Large-scale Dynamic Multiagent Curriculum Learning
- Multi-Agent Game Abstraction via Graph Attention Neural Network
- MAVEN: Multi-Agent Variational Exploration
Requirements
Acknowledgement
TODO List
- Add CUDA option
- DyMA-CL
- G2ANet
- MAVEN
- VBC
- Other SOTA MARL algorithms
- Update results on other maps
Quick Start
$ python main.py --map=3m --alg=qmix
Directly run the main.py
, then the algorithm will start training on map 3m
. Note CommNet and G2ANet need an external training algorithm, so the name of them are like reinforce+commnet
or central_v+g2anet
, all the algorithms we provide are written in ./common/arguments.py
.
If you just want to use this project for demonstration, you should set --evaluate=True --load_model=True
.
The running of DyMA-CL is independent from others because it requires different environment settings, so we put it on another project. For more details, please read DyMA-CL documentation.
Result
We independently train these algorithms for 8 times and take the mean of the 8 independent results, and we evaluate them for 20 episodes every 100 training steps. All of the results are saved in ./result
.
Results on other maps are still in training, we will update them later.
3m --difficulty=7(VeryHard)
1. Mean Win Rate of 8 Independent Runs on
8m --difficulty=7(VeryHard)
2. Mean Win Rate of 8 Independent Runs on
2s3z --difficulty=7(VeryHard)
3. Mean Win Rate of 8 Independent Runs on
Replay
If you want to see the replay, make sure the replay_dir
is an absolute path, which can be set in ./common/arguments.py
. Then the replays of each evaluation will be saved, you can find them in your path.