The Project aimed as final project in Data Engineer Bootcamp at Digital Skola to test my skills in ETL Data Pipelines end-to-end using Apache Airflow as orchestrator and scheduler, MySQL as Data Storage for Stagging Area, PostgreSQL as Data Warehouse, and all of them need to run on Docker container. Data Source retrieved from Application Programming Interface (API) and Data Warehouse has 5 Data Marts. Developing DAG with 4 tasks following the diagram below.
- Setup docker compose to run all tools and run it
- Develop Extract and Load to MySQL Stagging Area script
- Develop Transform and Load script to make 3 Data Marts Dimensional Tables with following specification:
- Province table
- province_id
- province_name
- District table
- district_id
- province_id
- district_name
- Case table
- Id
- Status name (suspect, closecontact, probable, confirmation)
- Status detail
- Province table
- Develop Transform and Load script to make 2 Data Marts Fact Tables with following specification:
- Province Daily Table
- Id (auto generate)
- province_id
- case_id
- date
- total
- District Daily Table
- Id (auto generate)
- district_id
- case_id
- date
- total
- Province Daily Table
- Develop DAG with necessary tasks following the diagram
- Simulate the ETL Data Pipelines process
- All set with 2 files of docker compose
aiflow/docker-compose.yml
config_db/docker-compose.yml
- Developed Extract and Load to Stagging Area namely
aiflow/dags/pyjobs/stagging_api_mysql.py
- Developed Transform and Load script to make 3 Data Marts Dimensional Tables namely
aiflow/dags/pyjobs/dim_datamart_mysql_psql.py
- Developed Transform and Load script to make 2 Data Marts Fact Tables namely
aiflow/dags/pyjobs/fact_datamart_district_daily.py
andaiflow/dags/pyjobs/fact_datamart_provice_daily.py
- Developed DAG with sequences of task
task1 >> task2 >> [task3,task4]
namelyaiflow/dags/dag_final_project.py
- All run well simulation of ETL Data Pipelines Process