/mtgcards

Scrape data on MtG decks

Primary LanguagePythonMIT LicenseMIT

mtgcards

Scrape data on MtG decks.

Description

This is a hobby project.

It started as a card data scraping from MTG Goldfish. Then, some JumpIn! packets info scraping was added. Then, there was some play with Limited data from 17lands when I thought I had to bear with utter boringness of that format (before the dawn of Golden Packs on Arena) [This part has been deprecated and moved to archive package]. Then, I discovered I don't need to scrape anything because Scryfall.

Then, I quit (Arena).

Now, the main focus is decks package and yt module (parsing data on youtubers' decks from YT videos descriptions).

What works

  • Scryfall data management via downloading bulk data with scrython and wrapping it in convenient abstractions
  • Scraping YT channels for videos with decklists in descriptions (using no less than three Python libraries: scrapetube, pytubefix, and youtubesearchpython to avoid bothering with Google APIs)
  • Arena, Aetherhub, Archidekt, Cardhoarder, Goldfish, Moxfield, MTGAZone, MTGDecks.net, MTGTop8, Scryfall, Streamdecker, TappedOut, TCGPlayer and Untapped deck parsers work, so:
    • Arena decklists pasted into video descriptions are parsed into Deck objects
    • Aetherhub, Archidekt, Cardhoarder, Goldfish, Moxfield, MTGAZone, MTGDecks.net, MTGTop8, Scryfall, Streamdecker, TappedOut, TCGPlayer and Untapped, links contained in those descriptions are parsed into Deck objects
    • Both Untapped decklist types featured in YT videos are supported: regular deck and profile deck
    • Both old and new TCGPlayer sites are supported
    • Due to their dynamic nature, Untapped, TCGPlayer (new site) and MTGDecks.net (not much of a dynamic site, but you do need to click a consent button) are scraped using Selenium
    • All those mentioned above work even if they are behind shortener links and need unshortening first
    • Arena decklists in links to pastebin-like services (like Amazonian does) work too
  • Other decklist services are in plans (but, it does seem like I've pretty much exhausted the possibilities already :))
  • Scraping Goldfish and MGTAZone for meta-decks (others in plans)
  • Scraping a singular Untapped meta-deck decklist page
  • Exporting decks into a Forge MTG .dck format or Arena decklist saved into a .txt file - with autogenerated, descriptive names based on scraped deck's metadata
  • Importing back into a Deck from those formats
  • Export/import to other formats in plans
  • Dumping decks, YT videos and channels to .json