/airflow-declarative

Airflow declarative DAGs via YAML

Primary LanguagePythonApache License 2.0Apache-2.0

airflow-declarative

Airflow declarative DAGs via YAML.

Compatibility: - Python 2.7 / 3.5+ - Airflow 1.8+ (should work with older versions as well, at least down to 1.7)

Key Features

  • Declarative DAGs in plain text YAML helps a lot to understand how DAG will looks like. Made for humans, not programmers.
  • It makes extremely hard to turn your DAGs into code mess. Even if you make complicated YAMLs generator the result would be readable for humans.
  • No more guilty about coupling business logic with task management system (Airflow). They now could coexists separated.
  • Static analysis becomes a trivial task.
  • It's a good abstraction to create your own scheduler/worker compatible with original Airflow one.

Examples

Check tests/dags directory for example of DAGs which will works and which won't. Use src/airflow_declarative/schema.py module for the reference about YAML file schema. It should be self descriptive.

Don't be shy to experiment: trafaret-config will help you to understand what had gone wrong and why and where.

Usage

Upstream Airflow

To use with current (up to 1.8.2 release) upstream Airflow, you need to provide DAGs via Python file anyway. That should looks something like this:

import os

import airflow_declarative

ROOT = '/usr/local/share/airflow'  # here should be yours path, whatever
DAGS = [
    airflow_declarative.from_path(os.path.join(root, item))
    for item in os.listdir(ROOT)
    if item.endswith(('.yml', '.yaml'))
]

globals().update({dag.dag_id: dag for dag in DAGS})

And place such file to AIRFLOW_HOME directory. Airflow will load dags in old fashion way.

Patched Airflow

Checkout patches directory for patches against Airflow release to have native declarative dags support on it. In this case no Python files are need on AIRFLOW_HOME path - just put there your YAMLs, they'll get loaded automagically.