/gthockey-stats

Scraping player statistics from achahockey.org. This is intended to be uses as a chron job for the gthockey php server

Primary LanguagePython

STATS SCRAPING

This repo contains code to scrape player statistics from the achahockey.org web page. This is a proof of concept and being developed for use to minimize labor for data entry.

Dependencies

Running the Code

After cloning the repo you can scrape data as follows:

$ scrapy crawl acha

Saving the Results

Scrapy supports several standards for storing scraped data. In order to store them in JSON, CSV or XML execute the respective command:

$ scrapy crawl acha -o items.json -t json
$ scrapy crawl acha -o items.csv -t csv
$ scrapy crawl acha -o items.xml -t xml

Scripting the Routines

There is now an automated script for running the scraping routines as well. This is for future use in CGI on a nearlyfreespeech web server.

$ python crawl.py

The previous will automatically scrape and store the data in a json file

Issues

For more information on how to use Scrapy please see the Scrapy Reference

Contributing

This is an open source project. Feel free to fork it and submit pull requests at will.