/Olympic_Data_engineering_project

An end to end data pipeline project to fetch Olympic data and to get insights from it.

Primary LanguageJupyter Notebook

Olympic_Data_engineering_project

End_to_End_Olympic_data_Engineering_usingAzure

About Project:

Using various Azure services, such as Azure Databricks, Azure Synapse Analytics, and Azure Data Factory, the Tokyo Olympic Data Engineering Project is a comprehensive data engineering solution that collects, processes, and analyzes data related to the Tokyo Olympic Games.

Data: https://github.com/rashmi0007/Olympic_Data_engineering_project/tree/main/Transformed_Olympic_DataSet

Data ingestion code: https://github.com/rashmi0007/Olympic_Data_engineering_project/blob/main/data_ingestion_pipelines_datafactory.JSON

The project uses Azure Data Factory to manage and automate the data integration and workflow processes. It extracts, transforms, and loads (ETL) data from different sources and stores the data in Data Lake. Then, Azure Databricks is used for data processing and transformation tasks. Databricks enables scalable and distributed data processing, allowing for effective data manipulation, cleaning, and aggregation. It also offers a collaborative environment for data engineers and data scientists to work together smoothly.

Azure Synapse Analytics, a powerful analytics service, is used for data warehousing and advanced analytics. It enables the storage and analysis of large volumes of structured and unstructured data. Olympic_Synapse_analytics

After the data is transformed it can be used for visualization and analysis using Tableau or PowerBI.