Welcome to the "Real-Time Streaming with Azure Databricks" repository. This project demonstrates an end-to-end solution for real-time data streaming and analysis using Azure Databricks and Azure Event Hubs, with visualization in Power BI. It's an in-depth guide covering the setup, configuration, and implementation of a streaming data pipeline following the medallion architecture.
To get started with this project, clone the repository and follow the guidance provided in this YouTube tutorial.
Real-time Data Processing with Azure Databricks (and Event Hubs).ipynb
: The Databricks notebook used for data processing at each layer of the medallion architecture.data.txt
: Contains sample data and JSON structures for streaming simulation.Azure Solution Architecture.png
: High level solution architecture.
- Active Azure subscription with access to Azure Databricks and Event Hubs.
- Databricks Workspace with Unity Catalog Enabled.
- Azure Event Hubs Service.
- Power BI Desktop (Windows).
- Familiarity with Python, Spark, SQL, and basic data engineering concepts.