mrpaulandrew/procfwk

Pipeline status running after timeout setting pipeline result

NJLangley opened this issue · 1 comments

Describe the bug
When the 04-Infant pipeline sets the pipeline result, if there is a connectivity issue and the store proc call to set the pipeline times out, the PipelineStatus in the CurrentExecution table remains 'Running'. When the next stage starts, the pipeline with the status of running does not block execution of pipelines that depend on it.

Affected services

  • Data Factory/Synapse
  • SQL Database

To Reproduce
This is difficult to reproduce, as it's been due to Azure connectivity errors when we have seen it. It can be simulated by changing the proc to return early for a specific worker pipeline.

Expected behaviour
If a pipeline status cannot be set successfully, this should be cleaned up before the next stage starts. The status should be error, as the exact outcome of the worker pipeline cannot be determined. The 04-worker pipeline should also retry setting the result, as it seems like this is most likely a small connectivity blip, and other worker pipelines finishing within a few seconds have not suffered the same issue.

Screenshots
Error - Timeout setting pipeline status

Additional context
I have fixed this by reducing the time to timeout (as it is a simple proc call that should not block for long), allowing 2 retries, and setting the time between retries to 5 seconds. I have also changed the proc procfwk.CheckForBlockedPipelines to add a check for pipelines in prior stages with a status of running, and raise errors for them (which sets the status too). The normal blocked pipeline logic then runs, and the framework continues according to the error handling mode.