UX Design: Deploying Pipelines to Airflow
Opened this issue · 1 comments
Goal
Add an Airflow DAG to a user-specified Airflow server for an artifact.
Current user workflow
The data scientist has developed an artifact, say an ML model called clf
, in a Jupyter notebook. To create an Airflow DAG, they would have to manually write a Python script to translate the code in the Jupyter notebook into the Airflow DSL to construct a DAG. This DAG is then placed in the DAG folder of the Airflow server they are submitting the DAG to.
User workflow with Linea
Note: This is agnostic of the entry point to Linea (CLI or IPython). We will discuss the UX at the API level.
Airflow config: the user specifies the URI for AIRFLOW_HOME
in a Linea config file, say in lineapy/config.yml
.
The user first calls lineapy.save(clf)
to get the LineaArtifact
object associated with clf
named clf_artifact
. The user then invokes lineapy.to_airflow(clf_artifact)
to generate a dag.py
file and send to the AIRFLOW_HOME
directory.
to_airflow()
takes an optional dict argument for users to specify the input parameters to the DAG if they are familiar with them, such as schedule_interval
and max_active_runs
.
function signature for to_airflow()
:
def to_airflow(artifacts: Union[LineaArtifact, List[LineaArtifact]], props: Dict[String, String]):
...
This allows users to pass in multiple artifacts for a single DAG.
Note: to_airflow()
handles the transfer of the dag.py
file to AIRFLOW_HOME
as in the current implementation. This is potentially a point of further discussion.
Desiderata
- No dependency on
airflow
fromlineapy
Proposed solution
Construct dag.py
using Jinja templates.