Citibike ML Prediction

Overview

This repository contains the solution for Assignment 2 of CS4225 Big Data Systems for Data Science AY23/24 Semester 1. The assignment focuses on predicting daily trip counts using Citi Bike data and weather datasets in New York City. The solution involves building a machine learning pipeline using Spark on Databricks.

Datasets

  • Citi Bike Data: Citi Bike data for the year 2022 is used as training data and data from January to July 2023 is used as testing data.
  • Weather Data: Weather data is obtained from Visual Crossing.