- Replay Parsing: Parses replays of Dota 2 matches to provide additional statistics per match.
- Item build times
- Pick order
- Number of pings
- Stun/disable time
- Consumables bought
- Runes picked up
- Laning position heatmap
- Ward placement map
- LHs per min table
- Radiant advantage/Gold/XP/LH graphs per min
- Teamfight summary
- Objective times
- Largest hit on a hero
- Ability uses/hits
- Item uses
- Gold/XP breakdown
- Damage/Kills crosstables
- Multikills/Kill streaks
- All chat
- Advanced Querying: Supports flexible querying and aggregation with the following criteria:
- Player(s) in game (account ID)
- Team composition (heroes)
- Opponent composition (heroes)
- Standard filters: patch, game mode, hero, etc.
- Aggregations:
- Result count, win rate
- Win rate by hour/day of week
- Histogram (number of matches across Duration, LH, HD, TD, K, D, A, etc.)
- Hero Matchups (win rate when playing as, with, against a hero)
- Teammates/Opponents (win rate playing with/against particular players)
- Max/N/Sum on multiple stat categories
- Mean item build times
- Skill accuracy
- Laning
- Ward maps
- Word clouds (text said and read in all chat)
- Pro Games: Professional matches are automatically parsed
- Comparison Tool: Computes a percentile for a player against all users
- Rating Tracker: Keep track of MMR by adding a Steam account as a friend
- Modular: Microservice architecture, with pieces that can be used independently
- Scalable: Designed to scale to thousands of users.
- Free: No "premium" features. All data is available for free to users.
- Open Source: All code is publicly available for feedback and contributions from the Dota 2 developer community.
- Web: Node.js/Express
- Storage: MongoDB/Redis
- Parser: Java (powered by clarity)
- Install dependencies. If on Debian/Ubuntu:
sudo bash init.sh
Otherwise, you're responsible for figuring out how to install dependencies yourself. - Create .env file with required config values in KEY=VALUE format (see config.js for a full listing of options)
cp .env_example .env
- Build
npm run build
- Run all services in dev mode (this will run under nodemon so file changes automatically restart the server):
npm run dev
. You can also start individual services.
wget https://github.com/yasp-dota/testfiles/raw/master/dota.zip && unzip dota && mongorestore --dir dota
to import a database dump if you want a medium-sized data set to work with.
- The project uses a microservice architecture, in order to promote modularity and allow different pieces to scale on different machines.
- Descriptions of each service:
- web: An HTTP server which serves the web traffic.
- retriever: This is a standalone HTTP server that accepts URL params
match_id
andaccount_id
, and interfaces with the Steam GC in order to return match details/account profile.- Accessing it without any params returns a list of the registered Steam accounts, and a hash mapping friends of those accounts to the Steam account.
- This is used in order to determine the list of users that have added a tracker as a friend.
- worker: Takes care of background tasks. Currently, this involves re-queueing currently active tasks on restart, and rebuilding the sets of tracked players, donated players, rating players, etc.
- parser: This is a standalone HTTP server that accepts a URL param
url
. It expects a compressed replay file.dem.bz2
at this location, which it downloads, streams throughbunzip2
, and then through the compiled parser.- Each parser maintains a copy of the current heroes, which is used to map combat log names ("npc_dota_hero...") to
hero_id
, which can be used to match the combat log units to a player. - The parser emits a newline-delimited JSON stream of events, which is picked up and combined into a monolithic JSON object sent back to the client as the response.
- The schema for the current parsed_data structure can be found in
utility.getParseSchema
.
- Each parser maintains a copy of the current heroes, which is used to map combat log names ("npc_dota_hero...") to
- parseManager: This reads Redis to find the currently available list of parse workers. A single endpoint may appear multiple times (as many cores as it has).
- This uses the Node cluster module to fork as many workers as there are available parsing cores.
- Each one processes parse jobs in Kue.
- Processing a job entails:
- Get the replay URL:
getReplayUrl
takes care of this. - Send a request to a parse worker.
- Read the response from parser worker and save as
match.parsed_data
- Get the replay URL:
- scanner: Reads the Steam sequential API to find the latest matches. If a match is found passing the criteria for parse.
operations.insertMatch
is called. Ifmatch.parse_status
is explicitly set to 0, the match is queued for parse. - proxy: A standalone HTTP server that simply proxies all requests to the Steam API. The host is functionally equivalent to
api.steampowered.com
. - skill: Reads the GetMatchHistory API in order to continuously find matches of a particular skill level.
- Applying the following filters increases the number of matches we can get skill data for;
min_players=10
hero_id=X
- By permuting all three skill levels with the list of heroes, we can get up to 500 matches for each combination.
- Applying the following filters increases the number of matches we can get skill data for;
- mmr: Processes MMR requests
- fullhistory: Processes full history requests
- Pipeline: Generally parses come in one of two ways:
- Sequential: We read a match from the Steam API that either has
leagueid>0
or contains a player in thetrackedPlayer
set. - Request: Requests are processed from the Request page. This reads the match data from the steam API, then uses
operations.insertMatchProgress
in order to force waiting for the parse to finish.- The client uses AJAX to poll the server. When an error occurs or the job finishes, it either displays the error or redirects to the match page.
- Requests are set to only try once.
- Sequential: We read a match from the Steam API that either has
- Player/match caching: We cache matches in Redis in order to reduce DB lookups on repeated loads.
- Player caching is more complicated. It means that whenever we add a match or add parsed data to a match, we need to update all of that match's player caches to reflect the change (to keep the cache valid).
- A client side bundle of JS is built (and minified in production) using Webpack. If you want to make changes to client side JS, you will want to run the watch script
npm run watch
in order to automatically rebuild after making changes. - Tools recommended for developers on the command line:
sudo npm install -g mocha foreman nodemon
mocha
is used to run the tests. Installing the command-line tool allows you greater control over which tests you want to run.foreman
is used to run services individually. The executable name isnf
.nodemon
watches the server files and restarts the server when changes are detected.
- Tests:
npm test
to run the full test suite. - Brief snippets and useful links are included in the wiki
- constants are currently built pre-run and written to file
- web requires constants
- fullhistory requires constants (needs to iterate through heroes)
- parser requires constants (for building parsed_data object)
- buildSets currently built by worker, includes getRetriever, getParser, which are service discovery and could be separated from the actual set building
- scanner requires buildSets in order to avoid leaking players, retries until available
- parseManager requires getRetrievers to get replay url, retries until available
- parseManager requires getParsers, since we need to set concurrency before starting, retries until available
- Project started in August 2014
- Originally forked from Rjacksonm1/matchurls, started in July 2013
- howardchung
- albertcui
- nickhh