gym-microrts-paper: A Python repository from vwxyzjn

Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games (CoG 2021)

This repo contains the code for the paper Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games.

Full paper | Blog Post

Get started

Make sure you have ffmpeg and jdk>=1.8.0 installed. Then install the dependencies:

git clone https://github.com/vwxyzjn/gym-microrts-paper
cd gym-microrts-paper
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Reproduce and plot results

Depreciation note

Note that the experiments are done with gym_microrts==0.3.2. As we move forward beyond v0.4.x, we are planing to deprecate UAS despite its better performance in the paper. This is because UAS has more complex implementation and makes it really difficult to incorporate selfplay or imitation learning in the future.

UAS experiments:

PPO + invalid action masking + diverse bots + IMPALA-CNN (our best agent)

python ppo_diverse_impala.py --capture-video

PPO + invalid action masking + diverse bots

python ppo_diverse.py --capture-video

PPO + invalid action masking

python ppo_coacai.py --capture-video

PPO + partial invalid action masking

python ppo_coacai_partial_mask.py --capture-video

PPO

python ppo_coacai_no_mask.py --capture-video

Gridnet experiments:

PPO + invalid action masking +half self-play / half bots + encoder-decoder

python ppo_gridnet_diverse_encode_decode.py --capture-video  --num-bot-envs 8 --num-selfplay-envs 16  --exp-name ppo_gridnet_selfplay_diverse_encode_decode

PPO + invalid action masking + selfplay + encoder-decoder

python ppo_gridnet_diverse_encode_decode.py --capture-video  --num-bot-envs 0 --num-selfplay-envs 24  --exp-name ppo_gridnet_selfplay_encode_decode

PPO + invalid action masking + diverse bots + encoder-decoder

python ppo_gridnet_diverse_encode_decode.py --capture-video

PPO + invalid action masking + diverse bots + IMPALA-CNN

python ppo_gridnet_diverse_impala.py --capture-video

PPO + invalid action masking + diverse bots

python ppo_gridnet_diverse.py --capture-video

PPO + invalid action masking

python ppo_gridnet_coacai.py --capture-video

PPO + partial invalid action masking

python ppo_gridnet_coacai_partial_mask.py --capture-video

PPO

python ppo_gridnet_coacai_no_mask.py --capture-video

Experiment management

We use Weights and Biases for experiments management, which syncs the training metrics, videos of the agents playing the game, and trained models of our script.

You can enable this feature by toggling the --prod-mode tag with the scripts above. For example, try running

python ppo_diverse_impala.py --capture-video --prod-mode --wandb-project gym-microrts-paper

and you should see ouputs similar to the following

wandb: Currently logged in as: costa-huang (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.10.25 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.10.24
wandb: Syncing run MicrortsDefeatCoacAIShaped-v3__ppo_diverse_impala__1__1618184644
wandb: ⭐️ View project at https://wandb.ai/vwxyzjn/gym-microrts-paper
wandb: 🚀 View run at https://wandb.ai/vwxyzjn/gym-microrts-paper/runs/2gw2f8tl
wandb: Run data is saved locally in /home/costa/Documents/work/go/src/github.com/vwxyzjn/gym-microrts-paper/wandb/run-20210411_194404-lokq7jxs
wandb: Run `wandb offline` to turn off syncing.

Evaluations

Once the agents are trained with --prod-mode toggled on, you can go to the experiment page to download the trained model, which we can use for evaluation. For example, you can download this experiment's agent.pt.

This repo comes with pre-trained models at the trained_models directory. To run evaluation for PPO + invalid action masking + diverse bots + IMPALA-CNN, for example, try running

curl -O https://microrts.s3.amazonaws.com/microrts/gym-microrts-paper/trained_models.zip &&unzip trained_models.zip
python agent_eval.py --exp-name ppo_diverse_impala \
    --agent-model-path trained_models/ppo_diverse_impala/agent-2.pt \
    --max-steps 4000 --num-eval-runs 100 \
    --wandb-project-name gym-microrts-paper-eval \
    --prod-mode --capture-video

To see how we run all the evaluations, check out agent_eval.sh.

Plots

Check out the code in the plots folder. Try running

curl -O https://microrts.s3.amazonaws.com/microrts/gym-microrts-paper/all_data.csv && mv all_data.csv plots/all_data.csv
python plot_ablation.py
python plot_all.py
python plot_hist.py
python plot_shaped_vs_sparse.py
python plot_uas_vs_gridnet.py

The CSV data is obtained either through the wandb export APIs or directly at the wandb dashboard such as the "Ablation Studies" report

Citation

Please using the following bibtex entry:

@inproceedings{huang2021gym,
  author    = {Shengyi Huang and
               Santiago Onta{\~{n}}{\'{o}}n and
               Chris Bamford and
               Lukasz Grela},
  title     = {Gym-{\(\mathrm{\mu}\)}RTS: Toward Affordable Full Game Real-time Strategy
               Games Research with Deep Reinforcement Learning},
  booktitle = {2021 {IEEE} Conference on Games (CoG), Copenhagen, Denmark, August
               17-20, 2021},
  pages     = {1--8},
  publisher = {{IEEE}},
  year      = {2021},
  url       = {https://doi.org/10.1109/CoG52621.2021.9619076},
  doi       = {10.1109/CoG52621.2021.9619076},
  timestamp = {Fri, 10 Dec 2021 10:41:01 +0100},
  biburl    = {https://dblp.org/rec/conf/cig/HuangO0G21.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}