Task completion event lost
ravig-kant opened this issue · 3 comments
Describe the bug
We are facing an issue where a conductor task remains in progress. This task executes in a do-while loop along with other tasks. The sequence of tasks in do-while is as follows.
UploadPrepare -> Upload_collectItem_Output -> Upload_item_start -> Upload -> Upload_item_end
In the annexed screenshot, for iteration 135, the Upload_item_start__135 is IN_PROGRESS. We have already marked task Upload_item_start__135 as COMPLETED. It triggered the next task of the same iteration i.e. Upload__135. Also, the next task is COMPLETED.
This seems like a case of lost updates. Moreover, the workflow is never completed.
Details
Conductor version: 3.18
Persistence implementation: Postgres
Queue implementation: Dynoqueues
Lock: Redis
Workflow definition:
Task definition:
Event handler definition:
To Reproduce
Steps to reproduce the behavior:
- Go to '...'
- Click on '....'
- Scroll down to '....'
- See error
Expected behavior
The task and the workflow should have been completed.
Additional context
Add any other context about the problem here.
Hi @ravig-kant what database backend are you using?
We are using postgres as backend @v1r3n
This is not a race condition within the persistence engine being used, but rather one of the general design. In this example what we have is the task emitting a kafka message, and the response to mark the task as complete comes before the task is marked as in progress. The remaining code on the original thread to mark the task as in progress then executes and moves from complete -> in progress.
This behaviour would be the same with any persistence engine and would only be able to be fixed if the update logic itself had a bit more complexity and logic to handle this case (potentially through conditional updates).