Starcraft II replay converter

Extracts data from websites and creates datasets for ML or analysis purposes.

Setup • Configuration • Usage • Table schemes •

About the Project

This repository is dedicated to gathering and organizing datasets for machine learning based StarCraft II bots. The aim of this project is twofold - firstly, it provides a tool to collect replay data that can be used in supervised training methods; secondly, it creates datasets suitable for use with value functions in reinforcement learning algorithms.

Available functionality:

Collect replays from two websites
Preprocess data into a human readable form
Transform data and load it into the DB.

Limitations to consider:

The only available game mode is 1v1.
Made for game version from 5.0.0 to 5.0.11

Prerequisites

Python <= 3.9 (the latest sc2replay library is available in Python version 3.9).
Access to configured PostgreSQL database.
Packages listed in requirements.txt.
Optionally: jupyter notebook

Setup

Create a new database in postgres (You can use this guide, for linux or this guide for windows)

Create a new database (using psql):

create database sc2replays;
\c sc2replays

Clone the repository by running

git clone https://github.com/dvarkless/sc2_replay_converter.git

Create a python virtual environment:

cd sc2_replay_converter
python -m venv venv

If you are using Linux or Mac:

source ./venv/bin/activate

If you are using Windows:

./venv/Scripts/activate.ps1

Install packages:

pip install -r requirements.txt

Download submodule

git submodule update --init --recursive

Configuration

Configuration files can be found in ./configs directory

Database access:

File ./configs/secrets.yml

db_host: localhost # Database url address
db_name: sc2replays # Database name
db_user: dvarkless # Username which can interract with the DB
db_password: password # Password for this user, set to `None` if it is not set

File ./configs/downloader_config.yml

The only reasonable thing to change here is user-agent:

headers:
  user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
  # Chrome from Windows device

If you want to add another site, you should add it into the config and write another method in class ReplayDownloader (def name_yield: ...).

Usage

The example code is provided in the download_and_process.ipynb

Collect replays:

from replay_downloader import ReplayDownloader

REPLAY_DIR = "../replays"
DOWNLOADER_CONFIG = "./configs/downloader_config.yml"

downloader = ReplayDownloader(REPLAY_DIR, DOWNLOADER_CONFIG, max_count=500, jupyter=True)
downloader.start_download("sc2rep")
# downloader.start_download("spawningtool")

Preprocess files

from replay_process import ReplayProcess, ReplayFilter
from datetime import datetime

REPLAY_DIR = "../replays"
SECRETS = "./configs/secrets.yml"
GAME_INFO_FILE = "./starcraft2_replay_parse/game_info.csv"

processor = ReplayProcess(
    SECRETS,
    DATABASE_CONFIG,
    GAME_INFO_FILE,
    jupyter=True
)

# Setup filter
replay_filter = ReplayFilter()
replay_filter.is_1v1 = True # Select only 1v1 games
replay_filter.game_len = [1920, 38400] # Games with length from 2 to 40 mins
replay_filter.time_played = datetime(2021, 1, 1) # Earliest allowed game

# Process replays (this should take a while)
processor.process_replays(REPLAY_DIR, filt=replay_filter)

Create dataset tables

from itertools import product
from pipeline import PipelineComposer

MINS_PER_SAMPLE = 4 # Take first samples every 4 minutes on average
PRED_STEP = 1 # Take every second samples 1 minute later
MIN_LEAGUE = 3 # Min league is Gold

r_pairs = product("ZTP", repeat=2) # ((Z, Z), (Z, T), ...)
matchups = ["v".join((r1, r2)) for r1, r2 in r_pairs] # ['ZvZ', 'ZvT', ...]
composer = PipelineComposer("ZvZ", tick_step=32)

# Create pipelines for each table type
for matchup in matchups:
    composer.change_matchup(matchup)
    comp_pipeline = composer.get_compositon(MINS_PER_SAMPLE, PRED_STEP, MIN_LEAGUE)
    comp_pipeline.run()

Table schemes:

Table schemes can be found in ./queries/create_*.sql

Dataset tables are created dynamically.
PRIMARY KEYS: tick, game_id. FOREIGN KEY: game_id REFERENCES game_info.
Their structure:

*_comp tables:

[NOTE] This tables are used to train which unit the agent should build next based on army composition and scouting info.

player_unit: INTEGER,
...
player_building: INTEGER,
...
player_minerals_available: INTEGER, 
player_vespene_available: INTEGER, 
enemy_unit: INTEGER,
...
out_unit: NUMERIC(4, 3) # 0.001 # player's units in 1 minute from current tick
...

*_winprob tables:

[NOTE] This tables are used to train agents to predict game outcome based on the available information.

game_id: INTEGER,
tick: INTEGER,
player_unit: INTEGER,
...
player_building: INTEGER,
...
player_upgrade: INTEGER,
...
player_minerals_available: INTEGER, 
player_vespene_available: INTEGER, 
enemy_unit: INTEGER,
...
enemy_building: INTEGER,
...
out_winprob: NUMERIC(4, 3) # 0.001 # probability what this game ends in 1 minute
								   # with 1 - player's win 
								   # or 0 - player's defeat

*_enemycomp tables:

[NOTE] This tables are used to train agents to predict enemy composition based on scouted buildings.

game_id: INTEGER,
tick: INTEGER,
enemy_building: INTEGER,
...
out_unit: NUMERIC(4, 3) # 0.001 # enemy units in 1 minute from now

matchups:

First letter of matchup means player's game race.
The last letter is enemy's race.
For example, 'ZvT' means player = 'Zerg', enemy = 'Terran'.
This affect table's unit, building and upgrades columns. Columns can be found in ./starcraft2_replay_parse/data/game_info.csv.

[NOTE] Mirror matchups count twice, player and enemy change their places.

License

Distributed under the MIT License. See LICENSE.txt for more information.