mrpaulandrew/procfwk

Triggering the Parent pipeline multiple times could overwrite preceding in progress runs

htwashere opened this issue · 4 comments

Describe the bug
When the parent pipeline is triggered and the framework has not reached the stage of registering into the framework db, if the user triggers again, there is a potential chance that subsequent trigger(s) will overwrite the preceding runs

Affected services
Which resource within the processing framework does this affect?

  • Data Factory
  • SQL Database (I believe the subsequent triggered pipeline will overwrite records of the preceding one)
  • Functions
  • All of them
  • Other

To Reproduce
Steps to reproduce the behavior:

  1. Trigger 02-Parent. Then immediately, trigger 02-Parent again. Technically, this would create two trigger runs. Recognizing that the current framework is not designed to allow multiple triggering of the same "job", so this is technically not an allowed function. But the current integrity check relies on the existence of the framework database records, and in this scenario, these records are not written/committed to the db yet hence the framework will not know that there is a job already triggered

PS: this is one of the reasons for Enhancement Request #61

Expected behaviour
The 2nd triggered run should fail (as if the integrity check has detected via db records)

Screenshots
Sorry I no longer have screen shots as I have created my own work around detection for this

Additional context
My current work around is that I created an additional function to check the ADF run log if a triggered job has already occurred (in my version, i have implemented a version of the enhancement request #61)

@htwashere

Thanks for your feedback.

It's a great idea to have the ADF only pre trigger check that doesn't rely on content from the database.

Would you be willing to share your work around code? Does it use the management API?

I'm not sure about this being considered as a bug, maybe a negative test. By design the framework assumes a tight coupling between the factory and the database. But, accept that it is possible to create multiple triggers as described.

Hopefully a user would know not to do this under normal circumstances.

Hi Paul, I hope all is well. I'm not sure how to upload my code so I will email to you shortly. Below is a quick explanation on how I tackle it.

Since my logic has included the "container" piece (per #61) , so it may take a bit of explaining to you. Anyway, I realize with this logic below, I can actually prevent needing your Stored Proc check on execution. Basically, I check for all the Pipelines runs in a determined time range (example, last X minutes) to see if the Container is still in active state, then I know that the pipeline is already being executed.

image

Cheers,
Henry

@htwashere
Thanks for the email.

Yes, understood. My check worker status function is actually doing something similar and based on an earlier blog post where I used a fixed time window to get the latest execution run/status.
https://mrpaulandrew.com/2019/11/21/get-any-azure-data-factory-pipeline-run-status-with-azure-functions/

Thanks for the input, I'll create an internal backlog feature for this request, hopefully reusing the existing check status function.

Of course, it does conflict with the other feature request to have multiple instances of the framework triggered so will think about including this as part of a wider set of options/properties.

Cheers

@htwashere just to follow up, I've baked this (check for running pipeline) behaviour into the framework and parent pipeline as a utility.
Thanks again for the great idea. I also blogged about it separately here while developing: https://mrpaulandrew.com/2020/11/12/get-data-factory-to-check-itself-for-a-running-pipeline-via-the-azure-management-api/