/NYC-Taxi-Demand-Prediction

The primary objective of this project is to build a Real-Time Taxi Demand Prediction Model for every district and zone of NYC.

Primary LanguageJupyter NotebookMIT LicenseMIT

NYC-Taxi-Demand-Prediction

The primary objective of this project is to build a Real-Time Taxi Demand Prediction Model for every district and zone of NYC.

Снимок экрана 2021-12-29 в 14 54 14

The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The whole dataset consists of approximately 30 million observations.

Project Structure:

  1. Data exploration and processing
  2. Geo data visualization
  3. Complex seasonal forecasting
  4. Model evaluation
  5. Model implementation
  6. Model enhancement

The main difficulty of the project is the fact that every district of NYC has its own structure, that's why we need to cluster districts with the same dynamic in groups. Then for each group we build unique model with optimal parameters and forecast the demand.

Снимок экрана 2021-07-09 в 00 34 18

Снимок экрана 2021-07-09 в 00 34 34

We should remember that the time series have complex seasonability with 24 and 168 lags, so we need to add extra features to the model. In this case we may use fourier series or dummy variables.

As for the model, it can be a linear regression or SARIMAX with exog features. The results of models are here.

Снимок экрана 2021-07-09 в 00 39 15

As we work with geo data, we may visualize forecasts on the map. It can be static or dynamic visualization.

Static visualization of NYC districts:

Снимок экрана 2021-07-08 в 22 18 37

Снимок экрана 2021-07-08 в 22 18 07

Dynamic visualization of forecasts in every NYC district in selected date and time:

sss