/warthunder-replay-parser

An very, very basic attempt at parsing Warthunder replay files (.wrpl)

Primary LanguagePythonMIT LicenseMIT

Warthunder Replay Parser

Warthunder replay files unfortunately do not seem to contain any easily readable information (like WOT, which includes some JSON). This a very, very basic attempt at parsing Warthunder replay files (.wrpl). There is wt-tools, which though does not seem to work with (multipart) server replays.

How to use it?

There are three scripts available:

replays_scraper.py

⚠️ Use at your own risk, scraping (protected) webpages might be against the TOS/law in certain countries

This script can be used to scrape replays from the https://warthunder.com/en/tournament/replay/ page. Invoke it like this:

python replays_scraper.py <num_pages>

where <num_pages> is the number of pages to scrape (typically there are 25 replays per page). It will print a JSON object with all the found replays.

Since the page is login protected, this script expects a auth_cookie.json file with the cookies for the login:

auth_cookie.json:

{
	"identity_sid" : "..."
}

where ... is the value of the identity_sid cookie (which you can get by logging in to warthunder.com and reading the cookies in your browser).

download_replay.py

Download a replay from https://warthunder.com/en/tournament/replay/.

python download_replay.py <replay_id>

where <replay_id> is the replay ID (64-bit, either in decimal or hexadecimal notation). This will store the replay files in a folder named named after the replay ID in hex notation.

parse_replay.py

Parse a replay in a folder

python parse_replay.py <replay_folder>

It expects the replay files to be named 0000.wrpl, 0001.wrpl, etc. If a <replay_folder> is not given, it will use the current directory.

The output will be in json form:

parsing replay in /path/to/replay/005569aa001501ca
parsing /path/to/replay/005569aa001501ca/0000.wrpl
parsing /path/to/replay/005569aa001501ca/0001.wrpl
parsing /path/to/replay/005569aa001501ca/0002.wrpl
parsing /path/to/replay/005569aa001501ca/0003.wrpl
parsing /path/to/replay/005569aa001501ca/0004.wrpl
parsing /path/to/replay/005569aa001501ca/0005.wrpl
parsing /path/to/replay/005569aa001501ca/0006.wrpl
parsing /path/to/replay/005569aa001501ca/0007.wrpl
parsing /path/to/replay/005569aa001501ca/0008.wrpl
parsing /path/to/replay/005569aa001501ca/0009.wrpl

{
    "level": "levels/avg_normandy.bin",
    "mission_file": "gamedata/missions/cta/tanks/normandy/normandy_dom.blk",
    "mission_name": "normandy_Dom",
    "time_of_day": "day",
    "weather": "hazy",
    "time_of_battle_ts": 1641217514,
    "time_of_battle": "2022-01-03 14:45:14",
    "num_players": 21,
    "players": [
        {
            "player_id": 34,
            "vehicles": [
                "us_m1a1_abrams",
                "us_m1a1_hc_abrams"
            ]
        },
        {
            "player_id": 35,
            "vehicles": [
                "us_m1_ip_abrams",
                "us_hstv_l"
            ]
        },
        ...
    ]
}

Use as module

You can also use the scripts as a modules

import replays_scraper
import download_replay
import parse_replay

# set the cookies
cookies = { "identity_sid" : "secret_key" }

# download the html
pages = replays_scraper.download_pages(1, cookies)

# scrape replay data from html
replays = []
for page in pages:
	replays += replays_scraper.parse_page(page)

# download the files of the last replay
download_replay.downloadReplay(replays[-1]["id"])

# get the hexadecimal id (= folder name)
replay_id_hex = download_replay._get_hex_id(replays[-1]["id"])

# parse the replay
print(parse_replay.parse_replay(replay_id_hex))

CSV Stats

Add your own cookie to parse_many.py to row

cookies = { "identity_sid" : "" }

First download replay pages. Script will download 100 last pages of replays(around 2500 maps, ~20 minutes of games) to replays.json file

python parse_many.py

Then filter only tank RB battles. It will write fitlered maps to tank_replays.json

python filter_tanks.py

Now you can extract maps list. It will create maps.csv with all games present in tank_replays.json

python maps_to_csv.py

After that we need to download and parse all replays. This command starts from index after it's name(0 in example), downloads replay, parses it and writes to results/{id}.json. We don't store replays on disk after parsing because 1000 replays would take around 70gb of space.

python download_many.py 0

It takes a lot of time, but you can run mulitple copies with different starting offsets. For example

python download_many.py 0

AND in different window

python download_many.py 500

That will start 2 scripts, one will go from 0 and second from 500th replay. You can make as much of them as your internet allows. I think 4 threads use 100-200 Mb/s

After you downloaded and parsed all replays you run players_to_csv.py script, which converts all files in results folder to one players.csv

python players_to_csv.py