- Name: Xavier Travers
- Student ID: 1178369
To determine the effects that the virus case rates (COVID-19 and Influenza) have on the distance of trips in New York yellow taxis per borough per week.
The timeline for the research area is starting January 2020 and ending December 2021 (See the report for justification).
Run all the scripts from the repository's root directory (do not cd
into the scripts
folder).
download.py
: Downloads the raw data into thedata/raw
directory. Run with
python3 ./scripts/download.py
generate_mmwr_weeks.py
: Generates adata/raw/mmwr_weeks.parquet
which is used for aggregation by week (where the Influenza data is already grouped by CDC/MMWR week). Run with
python3 ./scripts/generate_mmwr_weeks.py
notebooks/preprocessing/preprocessing_part_1_cleaning.ipynb
: Cleans the dataset (removes rows containingnull
and negative values where necessary).notebooks/preprocessing/preprocessing_part_2_aggregation.ipynb
: Groups the datasets by MMWR week and pick-up borough.- The data analysis notebooks: These can be explored in any order (since they do not change data, only generating plots).
notebooks/data_analysis/data_analysis_distance_distribution.ipynb
: Related to finding the distribution of trip distances.notebooks/data_analysis/data_analysis_distance_vs_time.ipynb
: Plots the trip distances over time.notebooks/data_analysis/data_analysis_geospatial_distance_mapping.ipynb
: Maps the average trip radii per borough.notebooks/data_analysis/data_analysis_viral_cases_vs_time.ipynb
: Plots the viral case rates over time.notebooks/data_analysis/data_analysis_distance_modelling.ipynb
: Generates the linear models of trip distances.notebooks/data_analysis/data_analysis_trip_rates_vs_time.ipynb
: Plots the trip rates over time. This is not used in the report.
There are several scripts located in the scripts
folder.
These have enough commenting to not need a breakdown of each here.
These are used throughout the code and should be installed before running.
For a more detailed snapshot of the modules I have installed when running my code,
see the requirements.txt
.
pyspark
pandas
matplotlib
statsmodels
geopandas
folium
numpy