/NYC_Pyspark

Primary LanguageJupyter Notebook

NYC Rideshare Data Modelling using Pyspark

In this project, I leveraged 80GB of NYC rideshare data to perform regression analysis on base passenger fares using the Gradient Boosting Tree algorithm. Utilizing Google Cloud Platform (GCP) for data processing and storage, I managed and analyzed large datasets. Additionally, employed graph networks and the PageRank algorithm to identify the most densely populated areas. This analysis provided valuable insights into fare structures and passenger distribution, supporting data-driven decision-making for urban planning and rideshare services.