Netflix/conductor

Conductor workflow stalled after a sub-workflow

rajeshwar-nu opened this issue · 3 comments

Hi Team,
I am experiencing this issue in latest version of conductor #3491

Stack

  1. orkesio/orkes-conductor-community:1.1.11
  2. Redis for workflow execution - http://docker.io/bitnami/redis:7.0.8-debian-11-r0|docker.io/bitnami/redis:7.0.8-debian-11-r0
  3. Postgres for workflow persistence - http://ghcr.io/cloudnative-pg/postgresql:15.3|ghcr.io/cloudnative-pg/postgresql:15.3

Description of issue

A workflow get stuck in RUNNING state right after completion of a SUBWORKFLOW. This was observed in multiple workflows we have, all having subworkflow. The issue is erratic, it only happens for a few executions.

I have attached 3 images for 3 sample failures

workflow1 (1)
workflow2 (1)
workflow3 (1)

The problem gets fixed when we pause and resume , after which it completes normally

Slack Message

Hi @rajeshwar-nu , Are these subworkflow retried or restarted?

Hey @manan164 , no they are not.

@rajeshwar-nu do they have double underscore in the name ?