/mtg-metagame-scraper

Web scrapper for MTG decklists

Primary LanguagePythonMIT LicenseMIT

๐ŸŒณ๐Ÿ”ฅ๐Ÿ’€๐Ÿ’งโ˜€๏ธ MTG decklist scrapper โ˜€๏ธ๐Ÿ’ง๐Ÿ’€๐Ÿ”ฅ๐ŸŒณ

A Magic the Gathering deck scrapper for the magic and mtgmelee websites using the scryfall API.

Project structure was created with this ETL blueprint.

๐Ÿ’พ Data

โ†ฉ๏ธ Extraction

Data is scraped from the websites using Selenium with headless Chrome and stored in the data/raw folder as it is, tagging it by site and tournament.

๐Ÿ”„ Transform

Normalize the data into json files at data/processed with the same format and add data retrieved from the scryfall API (prices, image, etc.).

โคด๏ธ Load

Load each deck and card from the json files into the postgreSQL database.

๐Ÿ“Š Visualization

Expose statistics of the given decks and its card attributes

Analysis

The goal of this project is to analyze the common factors among the top winner decks in different tournaments with more than 64 players registered. Card properties like type, set, colors, prices, mana cost and others are the ones provided by the Scryfall API.

โš ๏ธ Limitation note: currently the prices are obtained at the moment of the data extraction, not the date of the tournament.

Metagame

First, let's take a dive in the current metagame (october-november 2020)

Color distribution

decks-colors
Color count among decks

As most players know, the green color has been being pushed by the design team of Wizards of the Coast for quite some time in Standard format now, and in second place the red color makes an appearance, maybe together in the same decks?

๐Ÿ’ก Red and green decks are popular.

name-cloud
Card frequency in all the registered tournaments
name-bar
Card frequency in all the registered tournaments

The most popular card is the red Bonecrusher Giant with 2292 appearances in 51523 total cards in decks, followed by Lovestruck Beast and Edgewall Inkeeper, both green cards with 1950 and 1642 appearances. In a fourth place comes the red-green Brushfire Elemental with 1399 and Ox of Agonas closing the top 5 with 1264.

๐Ÿ’ก This top 5 is already the 16.5% of the total cards.

Type distribution

types-square
Card types count in all tournaments
types-bar
Card types count in all tournaments

From 51523 cards registered in tournaments, 22678 were creatures, 13109 nonbasic lands, 10263 instants, 8322 sorceries and 6295 enchantments, with other types below the 2500 each.

๐Ÿ’ก This gives a clear picture that agresive decks are abundant.

Converted Mana Cost Distribution

Most spells have a converted mana cost (an integer representing how much mana the spell costs), and in order for a player to interact in time in the game with their opponent, they need to have a "mana curve" fitting the metagame, that is a proper distribution of the mana costs of their spells:

mana-curve-example
Magic mana curve example filtered by creatures only

There is no use of building a deck with powerful cards if you cant play the before turn 5 while your opponent outraces you with cheap creatures.

cmc_dist
Converted mana cost distribution

Set distribution

types-square
Card types count for each set in all tournaments

Aggregating by expansion set we can see how much each set apports to each card type in the current metagame. Throne of Eldraine (ELD) and Zendikar Rising (ZNR) contributed with the most number of creatures, favoring aggro decks. Also, the more type-focused sets Theros Beyond Death (THB) and again Zendikar Rising (ZNR) add Enchantments and Lands. On the other hand, the sets with less card count in the metagame like M21 and Ikoria (IKR) add instants with more frequency than the other sets.

Rarity distribution

As MTG cards have different chances of coming up in a booster pack - of the fifteen playing cards included, one is a basic land, ten are common, three are uncommon, and one is rare (76%) or mythic rare (24%) - mythic and rares tend to be more valuable because of their availability, specially if their mechanics are useful in-game.

count-prices-rarity
Card rarity count and price in all tournaments

๐Ÿ’ก The rare cards are the most played and the mythics are the most expensive in average.

Winner decks

So, what of all the attributes of the cards showed before have relation with the finishing position of a deck in a tournament? Let's start by checking the correlation of the position of the deck with all the other variables. Because the lower position is better, the lower the value, the more they correlate.

correlations
Correlation between variables

Color

winner_colors decks_colors
1st place deck colors Average deck colors by tournament

The colors that tend to appear more among first place decks are blue and black, compared to the average. White appears in the same proportions in winner decks and average decks.

Type

winner_types
1st place card types

Set

winner_sets
1st place card types

CMC

winner_cmc
1st place CMC

Price

deck_prices_by_tournament
Average deck prices by tournament
There is a very slight inclination for the winning decks to be more cheap.