/sc2atari

Convert sc2 environment to gym-atari and play some mini-games

Primary LanguagePython

Info & References

Here deepmind's sc2 environment is simplified and converted to OpenAI's gym environment so that any existing atari-codes can be applied to simplified sc2-minigames.

The FullyConv -policy (smaller version) from https://deepmind.com/documents/110/sc2le.pdf is implemented and plugged into OpenAI-Baselines a2c implementation.

With this the 3 easiest mini-games can be "solved" quickly.

See also: https://github.com/islamelnabarawy/sc2agents and https://github.com/islamelnabarawy/sc2gym for similar project (stuff here was done independently but later).

Results

Map Episodes Avg score Max score Deepmind avg Deepmind max
MoveToBeacon 32*200 25 30 26 45
CollectMineralShards** 32*5000 73 100 103 134
DefeatRoaches** 48*4000 46 260 100 355

**CollectMineralShards and DefeatRoaches performance was still improving slightly

  • Avg and max are from the last n_envs*100 episodes.
  • For all maps used the parameters seen in the repo except n_envs=32 (48 in DefeatRoaches).
  • Episodes is the total number of playing-episodes over all environments.

Deepmind scores are shown for comparison. They are the FullyConv ones reported in the release paper.

How to run

Install the requirements (Baselines etc) below, clone the repo and do

python run_sc2_a2c.py --map_name MoveToBeacon --n_envs 32

This won't save any files. Some results are printed to stdout.

Requirements

  • Python 3 (will NOT work with python 2)
  • Open AI's baselines (tested with 0.1.4) (Can also skip the installation and dump the baselines folder inside this repo, most of the dependencies in baselines are not really if use only a2c)
  • pysc2 (tested with v1.2)
  • Tensorflow (tested with 1.3.0)
  • Other standard python packages like numpy etc.

Notes

Here we use only the screen-player-relative observation from the original observation space. Action space is limited only to one action: Select army followed by Attack Move (same for the author when he plays sc2).

With this slice from observation/action space we can make agent to learn the 3 mini-games mentioned above. However for anything more complicated it's not enough.

The action/obs-space limitation makes the problem very much easier, faster and less general/interesting. Because of this and the differences in the network and hyperparamteres the scores are not directly comparable with the release-paper.

The achieved scores here are considerably lower than the Deepmind results which suggests that the limited action space is not enough to achieve optimal performance (e.g micro against roaches or using two marines separately in shards).