/Cycling_Classification_Project

Using 1.8 million rows of traffic collision data from NYC's Open Data initiative, we ran thousands of classification models to discover which variables made collisions lethal for cyclists. The data was cleaned using Pandas, then Scikit-learn was used to instantiate and gridsearch Decision Tree, Random Forest, Logistic Regression, and K Nearest Neighbor models. Logistic Regression proved the most reliable, and was tuned for recall, as fatal collisions made up only .04% of all collisions involving cyclists. The variables most associated with lethal cycling collisions were then extracted back out as actionable insights.

Primary LanguageJupyter Notebook

No issues in this repository yet.