This is a real time data engineering project for guiding a cab company drivers towards areas of the cities with higher trip requests. It is built using below mentioned Azure services.
- Azure event hub
- Azure data lake storage gen2
- Azure databricks
- Tableau
The Architecture Diagram for this project is shown below -
The main tasks involved are -
- Generating data similar to trip requests using streaming_data_generator.py
- sending the generated data to Event Hubs using connection string.
- Storing logical regions of the city as reference data in Azure data lake storage gen2.
- Importing and transforming live trip requests data using geographic_data.csv with the help of sliding window and watermarking in spark structured streaming.
- Displaying the results in tableau dashboards using live connection to databricks.
I have used the below mentioned resources in Azure portal for building this movie recommender project end-to-end.
- Event Hubs
- Azure Databricks Service
- Data factory (V2)
- Storage account
- Tableau