Predicting-Ped-Volume: A Jupyter Notebook repository from black-tea

Modeling Pedestrian Volume at Intersections in Los Angeles

Background

The Los Angeles Department of Transportation (LADOT) collects volume data related to bicyclists, pedestrians, and motor vehicles for the purposes of transportation planning. Historically, these data have been stored in a PDF format, which makes it easy to digest a single traffic count, but prevents comparison and analysis of multiple counts. The goal of this project was to (1) convert these data into usable tables so the department can better understand travel trends and (2) construct a model to predict pedestrian activity at an intersection given built environment features. I completed this project as the final capstone requirement for Udacity's Machine Learning Nanodegree.

Data Assembly

Target Variable: LADOT historically collects manual counts at intersections, which include vehicle, bicycle, bus, and pedestrian volumes. I was able to extract 1,600 pedestrian volume counts from these sheets.
Features: pop. (ACS), emp. (ACS), transit ridership (LA Metro), presence of a signal (LADOT), school count within ¼ mi (LAUSD)

Model Construction

Two models: (1) OLS linear regression and (2) decision-tree
Train / Test Split: 70% train, 30% test

Results

Best Model: Decision Tree (2) with R^2 of .56
Important Features: Population & Employment
Error Distribution (see box plot)
- 25% of test samples with error < 25%
- 50% of test samples with error < 49%

File Organization

Initial Project Proposal with Literature Review: project_proposal/MLN Project Proposal.pdf
Notebook: PredictingPedestrianVolumes.ipynb
Data: data/FeatureData
Final PDF Report: pdf/PredictingPedestrianVolumes.pdf
HTML Map of Test Results: html/predmap.html

Related Work

Project to extract data from PDFs
R Shiny Viewer