spotify-stream-analytics

Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consumes and processes Kafka data, saving it to the Datalake. Airflow orchestrates the pipeline. dbt moves data to Snowflake, transforms it, and creates dashboards.

Dataset Simulation

Songs: Leveraged Spotify API to create artists and tracks data, extracted from set of playlists. Each track includes title, artist, album, ID, release date, etc.
Users: Created users demographics data with randomized first/last names, gender and location details.
Interactions: Real-time-like listening data linking users to songs they "listened."

Feel free to explore and analyze the datasets included in this repository to uncover patterns, trends, and valuable insights in the realm of music and user interactions. If you have any questions or need further information about the dataset, please refer to the documentation provided or reach out to the project contributors.

Tools & Technologies

Cloud - Azure
Infrastructure as Code software - Terraform
Containerization - Docker, Docker Compose
Secrets Manager - Azure Kevy Vault
Stream Processing - Apache kafka, Spark Streaming
Data Processing - Databricks
Data Warehouse - Snowflake
Pipeline Orchestration - Apache Airflow
Warehouse Transformation dbt
Data Visualization - Metabase
Language - Python

Architecture

Final Result

Project Flow

Setup Free Azure account & Azure Keyvault - Setup
Setup Terraform and create resources - Setup
SSH into VM (kafka-vm)
- Setup Kafka Server - Setup
- Setup Spotify API account & Generate Spotify Stream Events Data - Setup
- Setup Spark streaming job - Setup
Setup Snowflake Warehouse - Setup
Setup Databricks Workspace & CDC (Change Data Capture) job - Setup
SSH into another VM (airflow-vm)
- Setup dbt models - Setup
- Setup airflow - Setup

How can I make this better?!

A lot can still be done :).

Choose managed Infra
- Confluent Cloud for Kafka
Write data quality tests
Include CI/CD
Add more visualizations