Many news organizations use data from The Associated Press to power their election results reporting and real-time interactive maps. The code in this repository has been used by The Huffington Post since the 2012 Iowa caucuses to build results maps for elections including the Republican primaries, the general election and the Wisconsin recall in 2012 as well as the special elections in South Carolina and Massachusetts in 2013.
This repository is not affiliated with The Associated Press. You must have a contract with the AP and an account on its FTP server to use this code.
This repository has a single purpose: to get results off the AP's FTP server and into MySQL as fast as possible. It does not contain methods to query those results, and does not make assumptions about the front-end used to display the loaded data.
-
Install the necessary gems:
bundle install
-
Create local copies of the example config files:
cp config/ap.yml.example config/ap.yml cp config/database.yml.example config/database.yml
-
Enter your AP credentials into
config/ap.yml
, your database credentials intoconfig/database.yml
, and ensure the database referenced in database.yml exists locally. -
Import the AP's current Massachusetts data:
ruby crawl.rb --initialize --states=MA
The results data from the AP FTP server is now loaded into the ap_races
, ap_results
and ap_candidates
tables in MySQL. On subsequent imports for the current election in Massachusetts, you do not need to include the initialize
option. The full list of options is described below.
The AP conducts tests of its live results reporting in the weeks leading up to an election. With the record
and replay
parameters, you can record these tests and replay them at a later time, which is useful for development. Recordings can be easily stored on s3, which means you can make them accessible to other developers.
To record an AP test, start recording before the test begins, and stop it after the test is over:
ruby crawl.rb --record
You can now replay that test at any time:
ruby crawl.rb --replay
To store the recording on s3, create an s3.yml
config file from the example file provided, fill in your account information, and upload it:
ruby upload_replay.rb
Once uploaded, you can run that replay from any machine that has a corresponding s3.yml
:
ruby crawl.rb --replay
By default, the newest replay will always be run, but you can change that with the replaydate
option.
Posthooks allow you to create code that is run every time new results are imported. For example, at the Huffington Post, we often bake out static pages each time results are updated.
To add a posthook, copy the example file:
cp posthook/posthook.rb.example posthook/posthook.rb
Each time results have been updated, the run
method in your posthook will be called. You can add any code you need to that file, and add libraries or other external dependencies to the posthook directory.
The following options are available to crawl.rb
. Any option listed without examples is boolean and defaults to false.
states
: Comma-separated states to download- examples:
MA
,MA,CA
,all
- examples:
initialize
: Create initial set of results recordsonce
: Only download and import data onceclean
: Clean the data directories for specified states before downloadinginterval
: Interval in seconds at which AP data will be downloaded- examples:
300
,600
- examples:
posthook
: Run posthook after first iteration, even if results didn't changerecord
: Record this runreplay
: Replay the most recent runreplaydate
: Specify date of replay to run- examples:
20130521
,20130523
- examples:
replaytime
: Set the results to their state at the specified time.replaytimefrom
: Run the replay from the specified time onward.replaytimeto
: Run the replay up to the specified time.help
: Show help dialog
- Jay Boice, jay.boice@huffingtonpost.com
- Aaron Bycoffe, bycoffe@huffingtonpost.com
Copyright © 2013 The Huffington Post. See LICENSE for details.