- Name: Sen Turner
- Student ID: 1168692
Research Goal: My research goal is to analyse how driver's average fare and number of available trips changes based on certain conditions
Timeline: The timeline for the research area is February 2019 - January 2020.
To run the pipeline, please visit the scripts
and notebooks
directories and run the files in this order:
weather_scraper.py
: This downloads the weather data into thedata/raw
directory. <- Script taken from https://github.com/Karlheinzniebuhr/the-weather-scraperpreprocessing_notebook_part_1.ipynb
: This notebook downloads the raw High Volume FHV data and outputs it to thedata/raw
directory.preprocessing_notebook_part_2.ipynb
: This notebook details all the pre-processing and aggregating steps to produce one large dataset, including outlier detection and removal. Then outputs the dataset to thedata/curated
directory.data_analysis_average_fare.ipynb
: This notebook performs analysis on how average fare is impacted by pickup location, hour of the day, day of the week and weatherdata_analysis_trips.ipynb
: This notebook performs analysis on how number of trips is impacted by pickup location, hour of the day, day of the week and weathermodel_num_trips_nn.ipynb
followed bymodel_avg_pay_nn
: These notebooks train and test a neural network to predict the number of trips and average driver pay in any given hour, outputting predictions and true values of the test set todata/curated
directorymodel_num_trips_rf.ipynb
followed bymodel_avg_pay_rf
: These notebooks train and test a random forest classifier to predict the number of trips and average driver pay in any given hour, outputting predictions todata/curated
directorymodel_analysis.ipynb
: This notebook is used to create visualisations comparing performance of the models