Audiophile End-To-End ELT Pipeline

Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.

Architecture

Architecture

Infrastructure provisioning through Terraform, containerized through Docker and orchestrated through Airflow. Created dashboard through Metabase.

DAG Tasks:

  1. Scrape data from Crinacle's website to generate bronze data.
  2. Load bronze data to AWS S3.
  3. Initial data parsing and validation through Pydantic to generate silver data.
  4. Load silver data to AWS S3.
  5. Load silver data to AWS Redshift.
  6. Load silver data to AWS RDS for future projects.
  7. and 8. Transform and test data through dbt in the warehouse.

Dashboard

Dashboard

Requirements

  1. Configure AWS account through AWS CLI. [Reqruired for Terraform]
  2. Terraform. [Required to provision AWS services]
  3. Docker / Docker-Compose. [Required to run Airflow DAG / pipeline]

Run Pipeline

  1. make infra: create AWS services. You will be asked to enter a password for your Redshift and RDS clusters.
  2. make config: generate configuration with Terraform outputs and AWS credentials.
  3. make base-build: build base airflow image with project requirements.
  4. make build: build docker images for airflow.
  5. make up: run the pipeline.