NYC_Pyspark: A Jupyter Notebook repository from ashmita23

NYC Rideshare Data Modelling using Pyspark

In this project, I leveraged 80GB of NYC rideshare data to perform regression analysis on base passenger fares using the Gradient Boosting Tree algorithm. Utilizing Google Cloud Platform (GCP) for data processing and storage, I managed and analyzed large datasets. Additionally, employed graph networks and the PageRank algorithm to identify the most densely populated areas. This analysis provided valuable insights into fare structures and passenger distribution, supporting data-driven decision-making for urban planning and rideshare services.

ashmita23/NYC_Pyspark

NYC Rideshare Data Modelling using Pyspark