Support sending parameterized SQL queries to Databricks Jobs
jlaneve opened this issue · 0 comments
Problem statement
Currently the DatabricksWorkflowTaskGroup only supports creating notebook tasks using the DatabricksNotebookOperator
. While this feature unlocks all databricks python-based development (and to some extent SQL through spark.sql
commands), it does not allow users to take advantage of the Databricks SQL, which would limit the flows that users can create.
To solve this, we should offer support for adding support for sql_task
tasks.
sql_task
tasks allow databricks to refer to query objects that have been created in the databricks SQL editor. These queries can be parameterized by the user at runtime.
Solving this issue would involve two steps:
The first step is to create a DatabricksSqlQueryOperator
that expects a query ID instead of a SQL query. If run outside of a DatabricksWorkflowTaskgroup, this operator would be able to launch and monitor a SQL task on its own. The second step would be to create a convert_to_databricks_workflow_task
to convert the SQL operator task into a workflow task.
For this task to be completed, a SQL query should be added to the example DAG and should run through CI/CD.