/GathererImageGatherer

Builds a database of Magic the Gathering card images with card name, set, and perceptual hash of artwork.

Primary LanguagePythonMIT LicenseMIT

GathererImageGatherer

This project downloads all the card images from gatherer.wizards.com and saves them in the folder cardImages/ with their name and set.

The images can be used to build a database of perceptual hashes. Since each card artwork has a unique perceptual hash, they can be compared with perceptual hashes of a card in a picture to identify them. If a card is identified, the information can be input into http://shop.tcgplayer.com/magic for the user to quickly get the price.

Dependencies

To run these programs you will need the python libraries BeautifulSoup, requests, imagehash, PIL, and psycopg2.

    $> pip install -r requirements.txt
    or
    $> conda env create -f environment.yml
    git clone https://github.com/eulerto/pg_similarity.git
    cd pg_similarity/
    USE_PGXS=1 make
    USE_PGXS=1 make install

In postgres:

    CREATE EXTENSION pg_similarity;

Use

Download Images

    python scrapeImages.py

This downloads all the card images from http://gatherer.wizards.com/Pages/Default.aspx and saves them in the folder cardImages/ with their name and set.

The folder of pictures ends up being 1.21 GB and it takes about 25 minutes to download.

Setup The Database

Once postgres is installed, create a database and table needed for the python script.

    psql
    create database cardimages;
    \c cardimages
    create table phash(name text, set text, hash text);

Build The Database

    $> python buildDatabase.py

Populates a postgresql database with card name, set, and a perceptual hash of the artwork from the images downloaded with scrapeImages.py

Test A Card

    $> python queryDatabase.py

TODOs

  • Reorganize folders
  • Add Docker-Compose to develop without installing Postgres locally
  • Add a license (i.e. MIT)
  • Refactor the output paths to download
  • Add a way to resume downloads and avoid repetitions
  • Add a way to download from another sources (like ebay or google)
  • Refactor the way to download files creating a subfolder by card
  • Merge several ways to build the dataset
  • Test all python scripts to check function after route refactor
  • Finish Makefile
  • Add a way to automatically launch sql script once
  • Update README with makefile and new sections
  • Add a notebook example