Azure Databricks Formula1

This project involves the acquisition of Formula1 Datasets from the Ergast API. The transformations on these datasets are subsequently processed in 3 layers, i.e., Bronze -> Silver -> Gold. The transformations are executed using Databricks. The resultant data of each transformation is loaded into DELTA Lake with the intention of enabling the Analytics team to draw meaningful and practical insights from these datasets. The primary objective is to comprehensively understand the workings of Databricks.

The mission of this project is to transform the Bronze data (i.e., Raw data) of different formats into Silver data (i.e., Ingested data) in columnar format (i.e., Parquet), and then into Gold data (i.e., Presentation data) using PySpark in Databricks.

Ergast (https://ergast.com/mrd/)
I have manually ingested these datasets in different format into Datalake Gen2; Datasets

Tools ⚙

Check metadata exists before executing the ingestion notebooks using the IF Condition

Execute trans/1.race_results.ipynb first, then link trans/2.driver_standings.ipynb and trans/3.constructor_standings.ipynb on success.