Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
Infrastructure provisioning through Terraform, containerized through Docker and orchestrated through Airflow. Created dashboard through Metabase.
DAG Tasks:
- Scrape data from Crinacle's website to generate bronze data.
- Load bronze data to AWS S3.
- Initial data parsing and validation through Pydantic to generate silver data.
- Load silver data to AWS S3.
- Load silver data to AWS Redshift.
- Load silver data to AWS RDS for future projects.
- and 8. Transform and test data through dbt in the warehouse.
- Configure AWS account through AWS CLI. [Reqruired for Terraform]
- Terraform. [Required to provision AWS services]
- Docker / Docker-Compose. [Required to run Airflow DAG / pipeline]
make infra
: create AWS services. You will be asked to enter a password for your Redshift and RDS clusters.make config
: generate configuration with Terraform outputs and AWS credentials.make base-build
: build base airflow image with project requirements.make build
: build docker images for airflow.make up
: run the pipeline.