leobxpan/taxi-prediction

Course project for CS229. Implemented with Keras and PyTorch

Jupyter Notebook

taxi-prediction

AdaBoost Trees.ipynb -- Description: Testing AdaBoost Decision Trees for our dataset. Considering that we have a low feature count, we thought this would yield better results than random forests but it didn't.

Bucketing Analysis.ipynb -- Description: We needed to figure out what and how many labels we needed for our model. We first attempted to do this by using kmeans but the results didn't look too good. We decided to hand craft the labels by plotting the values and visually inspecting the distributions.

Bucketing Script.ipynb -- Description: The script that quickly buckets our dataset.

Cluster ID to Location ID Conversion.ipynb -- Description: We lost track of our Location IDs in our datasets and so this script basically links the cluster ids back to the location IDs.

Cluster Polygon Generation.ipynb -- Description: We looked into generating polygons for our cluster ids using vornoi diagrams so that we could plot our cluster geometries on the heat map but this proved too difficult.

Projection to mercator Projection.ipynb -- Description: We were given geo location boundaries for plotting that didn't fall in the specification required to plot on a map. This script converts the projection to the mercator projection. Mercator projections are used in web based map plotting.

Random Forest Regressors-Classifications Test.ipynb -- Description: We tested regression and classication rf to see the results.

Scratch.ipynb -- Description: Just a scratch pad to test some ideas before we actually ran anything.

bucket_label.py -- Description:

create_location_distribution.ipynb -- Description:

create_pickles.py -- Description: convert .csv files to pickle binary files

custom_loss.py -- Description: trial custom loss function to be applied to LSTM and FCNN

date_id_fetch.ipynb -- Description: initial data pre-processing file (not used ultimately)

df2arr.py -- Description: construct data samples (sequential & non-sequentail) in numpy array form from .pickle file containing original dataframes.

How to run it:

Modify pickle_path to the path of .pickle file on your computer;
Call arr2sample() if want to generate non-sequential samples. Call arr2seq() otherwise;
type python df2arr.py.

fare-distributions.ipynb -- Description: Same thing as Random Forest Regressors-Classifications Test.ipynb

fc_keras.py -- Description: FCNN in Keras.

fetch_coords_id.ipynb -- Description: Converts lat/long coords to mercator and then checks where the coordinates fall in the set of all location ids.

getFreq.ipynb -- Description: Scratch file used to test grouping

gpu_tutorial.py -- Description:

heat_map_generator.ipynb -- Description: Generates heat maps for the data. Need to feed a vector of whatever you want to plot of size the set of location ids.

irisDataset.py -- Description: In order to test whether the implementation of our FCNN is correct, we test it on the classical Iris dataset. Ignore this file.

k_means_validation.ipynb -- Description: script to validate our k_means distribution. Takes 95%/5% split and calculates the classification accuracy.

location_k_means.ipynb -- Description: This is used to generate the kmeans plots and generate the clusters for our dataset.

lstm.py -- Description: LSTM in Keras.

main.py -- Description: FCNN in PyTorch which is buggy. Ignore this file.

projection_to_mercator.ipynb -- Description: Already done above.

sample_dataset.py -- Description: sample 1% data from each month from July 2014 to June 2018 and save as a pickle binary file

svr.py -- Description: SVR using Scikit-Learn.

taxiDataset.py -- Description: creates a data set torch class for the input data (used for pytorch model, which was not ultimately used)

test_lstm.py -- Description: Test LSTM using a given model weight file.