/MAST30034-Project-1

This project aims to make a quantitative analysis of the New York City Taxi and Limousine Service Trip Record Data. The dataset covers trips taken in various types of taxi and for-hire vehicle services in the New York City area.

Primary LanguageHTMLMIT LicenseMIT

The University of Melbourne - MAST30034 (Applied Data Science)

Project 1 - Quantitative Analysis

Note: This is just the copy of the original project repository, the original project repository is kept private and is available upon request.

Introduction

This project aims to make a quantitative analysis of the New York City Taxi and Limousine Service Trip Record Data. The dataset covers trips taken in various types of taxi and for-hire vehicle services in the New York City area.

Dependencies

  • Language: Python 3.8.3
  • Packages / Libraries: pandas, pyspark, numpy, sklearn, geopandas, matplotlib, folium

Datasets

Directory

  • raw_data: Contains all the raw data files.
  • preprocessed_data: Contains all the preprocessed data files.
  • plots: Output and save all your figures here.
  • code: Keep all notebooks and scripts in this folder. Ensure that you have notebooks for each stage of code. Here's the instructions:
    1. run preprocessing.ipynb to download and preprocess data.
    2. run visualisation.ipynb for visualisation and exploratory data analysis.
    3. run modelling.ipynb for machine learning modelling.
  • deprecated: A folder to store "old code".