Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games (CoG 2021)
This repo contains the code for the paper Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games.
Make sure you have ffmpeg
and jdk>=1.8.0
installed. Then install the dependencies:
git clone https://github.com/vwxyzjn/gym-microrts-paper
cd gym-microrts-paper
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Note that the experiments are done with gym_microrts==0.3.2
. As we move forward beyond v0.4.x
, we are planing to deprecate UAS despite its better performance in the paper. This is because UAS has more complex implementation and makes it really difficult to incorporate selfplay or imitation learning in the future.
PPO + invalid action masking + diverse bots + IMPALA-CNN (our best agent)
python ppo_diverse_impala.py --capture-video
PPO + invalid action masking + diverse bots
python ppo_diverse.py --capture-video
PPO + invalid action masking
python ppo_coacai.py --capture-video
PPO + partial invalid action masking
python ppo_coacai_partial_mask.py --capture-video
PPO
python ppo_coacai_no_mask.py --capture-video
PPO + invalid action masking +half self-play / half bots + encoder-decoder
python ppo_gridnet_diverse_encode_decode.py --capture-video --num-bot-envs 8 --num-selfplay-envs 16 --exp-name ppo_gridnet_selfplay_diverse_encode_decode
PPO + invalid action masking + selfplay + encoder-decoder
python ppo_gridnet_diverse_encode_decode.py --capture-video --num-bot-envs 0 --num-selfplay-envs 24 --exp-name ppo_gridnet_selfplay_encode_decode
PPO + invalid action masking + diverse bots + encoder-decoder
python ppo_gridnet_diverse_encode_decode.py --capture-video
PPO + invalid action masking + diverse bots + IMPALA-CNN
python ppo_gridnet_diverse_impala.py --capture-video
PPO + invalid action masking + diverse bots
python ppo_gridnet_diverse.py --capture-video
PPO + invalid action masking
python ppo_gridnet_coacai.py --capture-video
PPO + partial invalid action masking
python ppo_gridnet_coacai_partial_mask.py --capture-video
PPO
python ppo_gridnet_coacai_no_mask.py --capture-video
We use Weights and Biases for experiments management, which syncs the training metrics, videos of the agents playing the game, and trained models of our script.
You can enable this feature by toggling the --prod-mode
tag with the scripts above.
For example, try running
python ppo_diverse_impala.py --capture-video --prod-mode --wandb-project gym-microrts-paper
and you should see ouputs similar to the following
wandb: Currently logged in as: costa-huang (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.10.25 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.10.24
wandb: Syncing run MicrortsDefeatCoacAIShaped-v3__ppo_diverse_impala__1__1618184644
wandb: ⭐️ View project at https://wandb.ai/vwxyzjn/gym-microrts-paper
wandb: 🚀 View run at https://wandb.ai/vwxyzjn/gym-microrts-paper/runs/2gw2f8tl
wandb: Run data is saved locally in /home/costa/Documents/work/go/src/github.com/vwxyzjn/gym-microrts-paper/wandb/run-20210411_194404-lokq7jxs
wandb: Run `wandb offline` to turn off syncing.
Once the agents are trained with --prod-mode
toggled on, you can go to the experiment page to download the trained model, which we can use for evaluation. For example, you can download this experiment's agent.pt
.
This repo comes with pre-trained models at the trained_models
directory. To run evaluation for PPO + invalid action masking + diverse bots + IMPALA-CNN
, for example, try running
curl -O https://microrts.s3.amazonaws.com/microrts/gym-microrts-paper/trained_models.zip &&unzip trained_models.zip
python agent_eval.py --exp-name ppo_diverse_impala \
--agent-model-path trained_models/ppo_diverse_impala/agent-2.pt \
--max-steps 4000 --num-eval-runs 100 \
--wandb-project-name gym-microrts-paper-eval \
--prod-mode --capture-video
To see how we run all the evaluations, check out agent_eval.sh
.
Check out the code in the plots
folder. Try running
curl -O https://microrts.s3.amazonaws.com/microrts/gym-microrts-paper/all_data.csv && mv all_data.csv plots/all_data.csv
python plot_ablation.py
python plot_all.py
python plot_hist.py
python plot_shaped_vs_sparse.py
python plot_uas_vs_gridnet.py
The CSV data is obtained either through the wandb
export APIs or directly at the wandb
dashboard such as the "Ablation Studies" report
Please using the following bibtex entry:
@inproceedings{huang2021gym,
author = {Shengyi Huang and
Santiago Onta{\~{n}}{\'{o}}n and
Chris Bamford and
Lukasz Grela},
title = {Gym-{\(\mathrm{\mu}\)}RTS: Toward Affordable Full Game Real-time Strategy
Games Research with Deep Reinforcement Learning},
booktitle = {2021 {IEEE} Conference on Games (CoG), Copenhagen, Denmark, August
17-20, 2021},
pages = {1--8},
publisher = {{IEEE}},
year = {2021},
url = {https://doi.org/10.1109/CoG52621.2021.9619076},
doi = {10.1109/CoG52621.2021.9619076},
timestamp = {Fri, 10 Dec 2021 10:41:01 +0100},
biburl = {https://dblp.org/rec/conf/cig/HuangO0G21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}