Here deepmind's sc2 environment is simplified and converted to OpenAI's gym environment so that any existing atari-codes can be applied to simplified sc2-minigames.
The FullyConv -policy (smaller version) from https://deepmind.com/documents/110/sc2le.pdf is implemented and plugged into OpenAI-Baselines a2c implementation.
With this the 3 easiest mini-games can be "solved" quickly.
See also: https://github.com/islamelnabarawy/sc2agents and https://github.com/islamelnabarawy/sc2gym for similar project (stuff here was done independently but later).
Map | Episodes | Avg score | Max score | Deepmind avg | Deepmind max |
MoveToBeacon | 32*200 | 25 | 30 | 26 | 45 |
CollectMineralShards** | 32*5000 | 73 | 100 | 103 | 134 |
DefeatRoaches** | 48*4000 | 46 | 260 | 100 | 355 |
**CollectMineralShards and DefeatRoaches performance was still improving slightly
- Avg and max are from the last n_envs*100 episodes.
- For all maps used the parameters seen in the repo except n_envs=32 (48 in DefeatRoaches).
- Episodes is the total number of playing-episodes over all environments.
Deepmind scores are shown for comparison. They are the FullyConv ones reported in the release paper.
Install the requirements (Baselines etc) below, clone the repo and do
python run_sc2_a2c.py --map_name MoveToBeacon --n_envs 32
This won't save any files. Some results are printed to stdout.
- Python 3 (will NOT work with python 2)
- Open AI's baselines (tested with 0.1.4) (Can also skip the installation and dump the baselines folder inside this repo, most of the dependencies in baselines are not really if use only a2c)
- pysc2 (tested with v1.2)
- Tensorflow (tested with 1.3.0)
- Other standard python packages like numpy etc.
Here we use only the screen-player-relative observation from the original observation space. Action space is limited only to one action: Select army followed by Attack Move (same for the author when he plays sc2).
With this slice from observation/action space we can make agent to learn the 3 mini-games mentioned above. However for anything more complicated it's not enough.
The action/obs-space limitation makes the problem very much easier, faster and less general/interesting. Because of this and the differences in the network and hyperparamteres the scores are not directly comparable with the release-paper.
The achieved scores here are considerably lower than the Deepmind results which suggests that the limited action space is not enough to achieve optimal performance (e.g micro against roaches or using two marines separately in shards).