An analysis of NYC Taxi-cab data using python and spark
Download the full dataset here: http://www.andresmh.com/nyctaxitrips/ or use the subset in data/
Download weather data (fill in your API key for forecast.io first) using python/get_weather_data.py
Fix hardcoded paths in python/generate-models.py
to point to the correct data and python directories
Run locally with spark-submit
ToDo: Clean up hardcoded paths
NOTE: This is still a WIP -- the model developed here is expository only