/pysc2-RLagents

Notes and scripts for SC2LE released by DeepMind and Blizzard, more details [here](https://github.com/deepmind/pysc2).

Primary LanguagePythonMIT LicenseMIT

pysc2-RLagents

Notes and scripts for SC2LE released by DeepMind and Blizzard, more details here.

Important Links

Original SC2LE Paper

DeepMind blog post

Blizzard blog post

PySC2 repo

Blizzard's SC2 API

Blizzard's SC2 API Protocol

Python library for SC2 API Protocol

Work by others

Chris' blog post and repo

Siraj's Youtube tutorial and accompanying code

Steven's Medium articles for a simple scripted agent and one based on Q-tables

pekaalto's work on adapting OpenAI's gym environment to SC2LE and an implementation of the FullyConv algorithm plus results on three minigames

Arthur Juliani's posts and repo for RL agents

Not SC2LE but mentioned here because my agent script was built on Juliani's A3C implementation.

Let me know if anyone else is also working on this and I'll add a link here!

Notes

Contains general notes on working with SC2LE.

Total Action Space

The entire unfiltered action space for an SC2LE agent.

It contains 524 base actions / functions with 101938719 possible actions given a minimap_resolution of (64, 64) and screen_resolution of (84, 84).

List of Action Argument Types

The entire list of action argument types for use in the actions / functions.

It contains 13 argument types with descriptions.

Running an Agent

Notes on running an agent in the pysc2.env.sc2_env.SC2Env environment. In particular, showing details and brief descriptions of the TimeStep object (observation) fed to the step function of an agent or returned from calling the step function of an environment.

ResearchLog

Contains notes on developing RL agents for SC2LE.

Agents

Contains a script that trains an A3C agent for the DefeatRoaches minigame.

PySC2_A3Cagent.py

I initially focused on the DefeatRoaches minigame and so I only took in 7 screen features and 3 nonspatial features for the state space and the action space is limited to 17 base actions and their relevant arguments.

For the action space, I modeled the base actions and arguments independently. In addition, I also model x and y coordinates independently for spatial arguments, to further reduce the effective action space.

The agent currently samples the distributions returned from the policy networks for the actions taken, instead of an epsilon-greedy.

Also, the policy networks for the arguments are updated irregardless of whether the argument was used (eg. even if a no_op action is taken, the argument policies are still updated), which should probably be corrected.

Will be updating this to work with all the minigames.