WordPress/openverse-catalog

[Improvement] Create DAG objects at top level

Closed this issue · 0 comments

Current Situation

Many of our DAG definition files have the following format:

DAG_ID = ...
DEFAULT_1 = ...
DEFAULT_2 = ...

def create_dag(value_1=DEFAULT_1, value_2=DEFAULT_2, ...):
    dag = DAG(DAG_ID, value_1, value_2)

    with dag:
        get_some_operator("foobar")

globals()[DAG_ID] = create_dag()

Much of this machinery is unnecessary, since the defaults are only ever used themselves.

Suggested Improvement

The above could be rewritten as:

DAG_ID = ...
DEFAULT_1 = ...
DEFAULT_2 = ...

dag = DAG(DAG_ID, value_1, value_2)

with dag:
    get_some_operator("foobar")

Since dag is an attribute of this module, Airflow will pick it up and add it to the DagBag appropriately. We can do this for most cases (save common_api_workflows it looks like, since that create_dag function is used in testing).

Benefit

Reduced code, simpler DAG files.

Additional context

Came about from a discussion in #238

Implementation

  • 🙋 I would be interested in implementing this feature.