/nyc-taxi-spark-ml

Example python spark machine learning on NYC taxi data

Primary LanguagePythonApache License 2.0Apache-2.0

Project Description

An analysis of NYC Taxi-cab data using python and spark

(Incomplete) Instructions

Download the full dataset here: http://www.andresmh.com/nyctaxitrips/ or use the subset in data/

Download weather data (fill in your API key for forecast.io first) using python/get_weather_data.py

Fix hardcoded paths in python/generate-models.py to point to the correct data and python directories

Run locally with spark-submit

ToDo: Clean up hardcoded paths

NOTE: This is still a WIP -- the model developed here is expository only