PY-RETROSHEET
Python scripts for Retrosheet data downloading and parsing.
YE REQUIREMENTS
-
Chadwick 0.6.2 http://chadwick.sourceforge.net/
-
python 2.5+ , python 3.4+
-
sqlalchemy: http://www.sqlalchemy.org/
-
[if using postgres] psycopg2 python package (dependency for sqlalchemy)
USAGE
Setup
cp scripts/config.ini.dist scripts/config.ini
Edit scripts/config.ini
as needed. See the steps below for what might need to be changed.
Download
python download.py [-y <4-digit-year> | --year <4-digit-year>]
The scripts/download.py
script downloads Retrosheet data. Edit the config.ini file to configure what types of files should be downloaded. Optionally set the year to download via the command line argument.
-
download
>dl_eventfiles
determines if Retrosheet Event Files should be downloaded or not. These are the only files that can be processed byparse.py
at this time. -
download
>dl_gamelogs
determines if Retrosheet Game Logs should be downloaded or not. These are not able to be processed byparse.py
at this time.
Parse into SQL
python parse.py [-y <4-digit-year>]
After the files have been downloaded, parse them into SQL with parse.py
.
-
Create database called
retrosheet
(or whatever). -
Add schema to the database w/ the included SQL script (the .postgres.sql one works nicely w/ PG, the other w/ MySQL)
-
Configure the file
config.ini
with your appropriateENGINE
,USER
,HOST
,PASSWORD
, andDATABASE
values - if you're using postgres, you can optionally defineSCHEMA
and download directory-
Valid values for
ENGINE
are valid sqlalchemy engines e.g. 'mysql', 'postgresql', or 'sqlite', -
If you have your server configured to allow passwordless connections, you don't need to define
USER
andPASSWORD
. -
If you are using sqlite3,
database
in the config should be the path to your database file. -
Specify directory for retrosheet files to be downloaded to, needs to exist before script runs
-
-
Run
parse.py
to parse the files and insert the data into the database. (optionally use-y YYYY
to import just one year)
USAGE(Python 3.5.0 & MySQL 5.6+ Only)
Download
python retrosheet_download.py [-f <from 4-digit-year>] [-t <to 4-digit-year>] [-c <config.ini path>]
Parse
python parse_csv.py [-f <from 4-digit-year>] [-t <to 4-digit-year>] [-c <config.ini path>]
Into SQL
python retrosheet_mysql.py [-f <from 4-digit-year>] [-t <to 4-digit-year>] [-c <config.ini path>]
Migration(Download - Parse - Into SQL)
python migration.py [-f <from 4-digit-year>] [-t <to 4-digit-year>] [-c <config.ini path>]
YE GRATITUDE
Github user jeffcrow made many fixes and additions and added sqlite support
JUST THE DATA
If you're using PostgreSQL (and you should be), you can get a dump of all data up through 2014 (warning: 502MB) here