UX Design: Deploying Pipelines to Airflow

Question

UX Design: Deploying Pipelines to Airflow

Opened this issue 3 years ago · 1 comments

Goal
Add an Airflow DAG to a user-specified Airflow server for an artifact.

Current user workflow
The data scientist has developed an artifact, say an ML model called clf, in a Jupyter notebook. To create an Airflow DAG, they would have to manually write a Python script to translate the code in the Jupyter notebook into the Airflow DSL to construct a DAG. This DAG is then placed in the DAG folder of the Airflow server they are submitting the DAG to.

User workflow with Linea
Note: This is agnostic of the entry point to Linea (CLI or IPython). We will discuss the UX at the API level.

Airflow config: the user specifies the URI for AIRFLOW_HOME in a Linea config file, say in lineapy/config.yml.

The user first calls lineapy.save(clf) to get the LineaArtifact object associated with clf named clf_artifact. The user then invokes lineapy.to_airflow(clf_artifact) to generate a dag.py file and send to the AIRFLOW_HOME directory.

to_airflow() takes an optional dict argument for users to specify the input parameters to the DAG if they are familiar with them, such as schedule_interval and max_active_runs.

function signature for to_airflow():

def to_airflow(artifacts: Union[LineaArtifact, List[LineaArtifact]], props: Dict[String, String]):
    ...

This allows users to pass in multiple artifacts for a single DAG.

Note: to_airflow() handles the transfer of the dag.py file to AIRFLOW_HOME as in the current implementation. This is potentially a point of further discussion.

Desiderata

No dependency on airflow from lineapy

Proposed solution
Construct dag.py using Jinja templates.

dorx commented 3 years ago

CC @marov