This repo attempts to reproduce the results of the PPO model found in the source code for the paper A Closer Look at Invalid Action Masking in Policy Gradient Algorithms. The original implementation is compared against a new model using Stable Baselines 3's PPO implementation, or sb3-contrib's MaskablePPO when invalid action masking is enabled.
You should already have Python 3.8+, Java 8+, and CUDA 10.1+ installed.
Install MicroRTS JAR:
rm -fR ~/microrts && mkdir ~/microrts && \
wget -O ~/microrts/microrts.zip http://microrts.s3.amazonaws.com/microrts/artifacts/202004222224.microrts.zip && \
unzip ~/microrts/microrts.zip -d ~/microrts/
Then install the necessary Python packages:
pip install -r requirements.txt
python original/new_train_ppo_4x4.py
Note that this script is functionally identical to the original found here.
# No masking
python sb3/train_ppo.py zoo [size] # Size may be 4 or 10
# Masking enabled
python sb3/train_ppo.py zoo [size] --mask # Size may be 4 or 10
This script makes use of SB3's core PPO algorithm by default. If masking is enabled, sb3-contrib's MaskablePPO is used instead.
The results in the zoo were produced via the following commands:
# 4x4 unmasked
python sb3/train_ppo.py zoo 4
python sb3/train_ppo.py zoo 4 --seed 43
python sb3/train_ppo.py zoo 4 --seed 44
# 4x4 masked
python sb3/train_ppo.py zoo 4 --mask
python sb3/train_ppo.py zoo 4 --mask --seed 43
python sb3/train_ppo.py zoo 4 --mask --seed 44
# 10x10 unmasked
python sb3/train_ppo.py zoo 10
python sb3/train_ppo.py zoo 10 --seed 43
python sb3/train_ppo.py zoo 10 --seed 44
# 10x10 masked
python sb3/train_ppo.py zoo 10 --mask
python sb3/train_ppo.py zoo 10 --mask --seed 43
python sb3/train_ppo.py zoo 10 --mask --seed 44
You may view the results in the zoo for yourself by running:
# For 4x4 environment
tensorboard --logdir zoo/4x4/runs
# For 10x10 environment
tensorboard --logdir zoo/10x10/runs