airflow-brand: A Python repository from mitesh91

Brandless Data Engineering Take Home Exercise

Setup:

Scheduler :Apache Airflow
Database : Postgres 10.0(dev)/ Amazon Redshift(prod)
Intermediate file/object storage : Local file system and S3
Languages used : Python, SQL

In this exercise we have setup an Airflow scheduler which queries Edemam Recipe Search API and pulls data for the Pasta recipes and also the health and diet labels associated with each recipes in our recipes table. We have scheduled daily incremental batch jobs which loads data into 3 tables

recipe_health_lables
recipe_diet_lables &
recipes

We have separate Python tasks to load into each of the three tables in the form of a DAG.

mitesh91/airflow-brand