Duplicate Job Creation in Databricks During Airflow DAG Runs
Closed this issue · 0 comments
Hang1225 commented
Issue
Our teams at HealthPartners are encountering a recurring issue where each execution of an Airflow DAG leads to the creation of a new job, despite the job already existing within the Databricks workspace.
This issue is most likely linked to the Databricks REST API retrieving a limit of 20 jobs per request, by default. In instances where the workspace contains over 20 jobs, additional API requests are necessary utilizing the 'next_page_token' from the initial call to fetch the complete job list.
Proposed Solution
Under "_get_job_by_name" function in operators/workflow.py:
- directly pass the
job_name
parameter to the jobs_api.list_jobs() method to leverage the API's built-in job name filtering capability. This approach is more efficient than fetching an exhaustive job list and subsequently filtering for the specific job.