After workflow repaired task is executed two times
astelmashenko opened this issue · 2 comments
Describe the bug
We notices that task is executed twice sometimes. After we enabled debug logs we found out that after WorkflowRepairService re-queued task for some reason the task was exeucted two times:
INFO 2022-07-04T07:56:38,583 147034 com.netflix.conductor.core.reconciliation.WorkflowRepairService [sweeper-thread-1] Task 425d9c94-dc30-441b-b21b-73ccc5118829 in workflow d6e20f06-c884-4c25-81a4-4a7c0eb3827e re-queued for repairs
DEBUG 2022-07-04T07:56:42,994 151445 com.netflix.conductor.contribs.tasks.http.HttpTask [system-task-worker-1] Response: 200, {bills={partyAUTHOR={biId=5200737, status=OPEN}, partyUNIVERSITY={biId=5200740, status=OPEN}}}, task:425d9c94-dc30-441b-b21b-73ccc5118829
DEBUG 2022-07-04T07:56:42,994 151445 com.netflix.conductor.contribs.tasks.http.HttpTask [system-task-worker-0] Response: 200, {bills={partyAUTHOR={biId=5200738, status=OPEN}, partyUNIVERSITY={biId=5200739, status=OPEN}}}, task:425d9c94-dc30-441b-b21b-73ccc5118829
What does WorkflowRepairService do and do we need it at all? Why does it happen even when we have lock service?
Thanks.
Details
Conductor version: 3.7.2
Persistence implementation: Postgres
Queue implementation: Postgres
Lock: Redis
To Reproduce
This happens from time-to-time, we did not find steps to reproduce
Expected behavior
HTTP task must be executed only once.
The original issue was opened condcutor-community Netflix/conductor-community#70
But nobody responded in months
Hi @astelmashenko , WorkflowRepairs checks for the taskId before pushing anything into the queue. Are you using locks in your configuration? There is a high chance that workflow execution is not guarded by locks so the task may be picked up by two different threads.
@manan164 , Yes we are using lock (Redis). What I have in mind is upgrade of conductor. E.g. we fixed something in our custom task and re-deploying conductor with thousands of workflows. How does it stop, e.g. stop decider firtst, wait for complete of all running tasks, stop connections and shutdown conductor.
The question: Is the process of shutdown deterministic, is there evidence that it shutdowns gracefully?