/nyc_taxi_analysis

Preprocessing and Analyzing NYC Yellow taxi passenger count for 2022 and 2023 using Isolation forest

Primary LanguagePython

Preprocessing and Analyzing NYC Yellow taxi passenger count for 2022 and 2023 using Isolation forest

Steps

  1. First, download the queried dataset as csv to the folder data/ from 2022dataset and 2023dataset
  2. install required packages pip install -r requirements.txt
  3. Run all code in preprocess.iypnb. It first group the number of passenger per hour and do data cleaning to remove problematic data in the original dataset. Then it locates the maximum number of passenger per hour in one day to further reduce the data size. Finally it saves the processed data as csv files.
  4. Run python anomoly_detection.py

Results

result