Airflow declarative DAGs via YAML.
Compatibility: - Python 2.7 / 3.5+ - Airflow 1.8+ (should work with older versions as well, at least down to 1.7)
- Declarative DAGs in plain text YAML helps a lot to understand how DAG will looks like. Made for humans, not programmers.
- It makes extremely hard to turn your DAGs into code mess. Even if you make complicated YAMLs generator the result would be readable for humans.
- No more guilty about coupling business logic with task management system (Airflow). They now could coexists separated.
- Static analysis becomes a trivial task.
- It's a good abstraction to create your own scheduler/worker compatible with original Airflow one.
Check tests/dags directory for example of DAGs which will works and which won't. Use src/airflow_declarative/schema.py module for the reference about YAML file schema. It should be self descriptive.
Don't be shy to experiment: trafaret-config will help you to understand what had gone wrong and why and where.
To use with current (up to 1.8.2 release) upstream Airflow, you need to provide DAGs via Python file anyway. That should looks something like this:
import os
import airflow_declarative
ROOT = '/usr/local/share/airflow' # here should be yours path, whatever
DAGS = [
airflow_declarative.from_path(os.path.join(root, item))
for item in os.listdir(ROOT)
if item.endswith(('.yml', '.yaml'))
]
globals().update({dag.dag_id: dag for dag in DAGS})
And place such file to AIRFLOW_HOME
directory. Airflow will load dags in
old fashion way.
Checkout patches directory for patches against Airflow release to have native
declarative dags support on it. In this case no Python files are need on
AIRFLOW_HOME
path - just put there your YAMLs, they'll get loaded
automagically.