Data Engineering Project with GCP| Uber Data Analytics

Introduction


In this project, I demonstrated my ETL skills using GCP to analyze Uber data. I used Cloud Storage for data storage, Python for pre-processing, Compute Instance and Mage Data Pipeline for running scripts and data transformation, BigQuery for the database, and Looker Studio for data visualization. This enabled smooth data extraction, transformation, and loading, leading to valuable insights for decision-making.

Architecture


Screenshot 2023-11-20 at 1 25 32 PM

Technology Used


  • Programing language: Python
  • Google Cloud Storage
  • Compute Engine (VM instances)
  • BigQuery
  • Looker Studio
  • Data pipeline tool: mage.ai (open source project:https://github.com/mage-ai/mage-ai)

Dataset Used


The TLC Trip Record Data for yellow and green taxis includes details such as pick-up and drop-off times, locations, trip distances, fares, rate types, payment methods, and passenger counts reported by the driver. (https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page)

Data Model


modeling

Mage Pipeline


Screenshot 2023-11-20 at 6 00 08 PM