The goal of this project is to perform data analytics on Uber data using various tools and technologies, including GCP Storage, Python, Compute Instance, Mage Data Pipeline Tool, BigQuery, and Looker Studio.
- Layout data pipeline architecture.
-
Host Uber data (CSV file) on staging storage (google storage).
-
Set up Google Compute Instance (VM) with Python and Mage to handle the ETL process.
-
Model the data into various tables (star schema).
- Write Python scripts on Mage to:
- Extract data from google cloud.
- Transform, filter, and split the data into multiple tables.
- Load the data into BigQuery schema.
-
Create a new analytics table to feed Looker Dashbaord.
-
Set up Looker Dashbaord to visualize the data into different charts.
- Programming Language - Python
Google Cloud Platform
- Google Storage
- Compute Instance
- BigQuery
- Looker Studio
TLC Trip Record Data Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
More info about the dataset can be found here: