chau0/data_pipeline_air_flow

An project example to demonstrate how to create a data pipe line with apache airflow

Apache-2.0

Overview

This project demonstrates how to create a data pipeline with Apache Airflow.

Extract raw data from S3 to Redshift
Transform and loading from staging tables to dimension tables
Perform data quality checks to monitor data pipeline and track data lineage

Dataset

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

Directory structure

Step to run

Build project docker image

Data pipeline

Database schema