This archive contains the various notebooks (within /notebooks) that I used to perform my analysis for this project. I have provided them as both ipynb
(Jupyter Notebooks) and pdf
files. Datasets have been provided under /data.
If you wish to run these notebooks yourself, there are a few requirements to do so:
- Python 3.6
- Once this is installed run:
pip3 install -r requirements.txt
- Alternatively, you can install Anaconda to obtain these packages.
- Once this is installed run:
- R
- Jupyter notebook kernels for both Python3.6 and R
- R Driver
- Python3.6 kernel
python3 -m pip install ipykernel
python3 -m ipykernel install --user
The data used for this analysis has been provided by Citibike NYC. I have included the datasets (within /data
) that I utilized as a part of my analysis. One dataset that was too large (~500mb compressed) was the 2017 trips dataset, which can be downloaded. Note that the notebooks have been designed to read the csv files directly as zip
archives, therefore no need to unzip
.
citibike_2017.csv.zip
: this dataset is a collection of the 2017 citibike trip dataset.- As noted above this file has been assembed and is hosted on my Google Drive.
- The scripts used to build the dataset are
download_data.py
andmerge.sh
for reference.
citibike_stations_2017.csv.zip
: Data extracted from The Open Bus project for 2017.rebalanced_bikes.csv.zip
: This file is a subset of the trips dataset (citibike_2017.csv.zip), which consists of the rebalancing trips made for 2017.work_cluster.csv
: Stations that are members of the "work cluster"residential_cluster.csv
: Stations that are members of the "residential cluster"low_cluster.csv
: Stations that are members of the "low cluster"ridership/
: This folder consists of Aggregated ridership and membership information from May 2013 - December 2017.
I chose to conduct my analysis in Jupyter notebooks to easily fuse the code with explainations and visualizations. I have provided all notebooks as ipynb
sources and pdf
files.
construct_rebalancing_trips
: This notebook originally constructed the rebalanced bikes dataset from the 2017 trip data (both have been assembled and provided as the above section).station_clusters
: This notebook examinescitibike_stations_2017.csv.zip
(sourced from the Open Bus Project) to produce a set of clusters as csv files.ridership_forecasting
: This notebook examines the daily ridership information (ridership/
) to produce forecasting models for demand over time.balancing_predictions
: This notebook contains various models to mimic Citibike's current balancing behaviour.