Step retrying even after updating to not retry
Closed this issue · 5 comments
Using 0.1.0
I deployed a new pipeline with default retry strategy and encountered some errors inside my container which started resulting in Dataflow retrying for those particular messages.
I updated my spec to below added snippet and expected that the retry would stop but the retries didn't stop for that message
Is this expected behaviour?
Step spec:
spec:
steps:
- container:
image: abc
name: enrich-query
scale:
desiredReplicas: limit(currentReplicas + pendingDelta / (60 * 250) + pending
/ (10 * 60 * 250), 1, 3, 1)
peekDelay: |-
"20m"
scalingDelay: |-
"1m"
sources:
- kafka:
name: TOPIC
topic: TOPIC
brokers:
- 'kafka-headless.kafka.svc.cluster.local:9092'
startOffset: Last
retry:
steps: 0
Referenced: https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-erroring-pipeline.yaml
Even after creating a new pipeline with the above spec, retry still happens
It should def be possible to have no retry.
What do the runner logs say?
Logs from init inside Step:
W
time="2021-10-21T12:40:30Z" level=info msg=cpu numCPU=8
time="2021-10-21T12:40:30Z" level=info msg=process pid=1
time="2021-10-21T12:40:30Z" level=info msg=version version=v0.1.0
time="2021-10-21T12:40:30Z" level=info msg="not enabling pprof debug endpoints"
time="2021-10-21T12:40:30Z" level=info msg="copying binary" name=/var/run/argo-dataflow/kill
time="2021-10-21T12:40:30Z" level=info msg="copying binary" name=/var/run/argo-dataflow/prestop
time="2021-10-21T12:40:30Z" level=info msg="creating authorization file"
time="2021-10-21T12:40:30Z" level=info msg="creating out fifo"
time="2021-10-21T12:40:30Z" level=info msg=cloning checkout=/var/run/argo-dataflow/checkout url="git@github.com:atlanhq/marketplace-scripts"
time="2021-10-21T12:40:30Z" level=info msg="getting secret for auth" SSHPrivateKeySecret="{\"name\":\"git-ssh\",\"key\":\"private-key\"}"
Enumerating objects: 117, done.
Counting objects: 100% (117/117), done.
Compressing objects: 100% (100/100), done.
Total 117 (delta 6), reused 68 (delta 0), pack-reused 0
time="2021-10-21T12:40:34Z" level=info msg="moving checked out code" path=/var/run/argo-dataflow/checkout/marketplace_scripts/query_enrichment wd=/var/run/argo-dataflow/wd
Logs from Sidecar:
time="2021-10-21T12:58:10Z" level=info msg=retry backoff="{\"Duration\":819200000000,\"Factor\":2,\"Jitter\":0.1,\"Steps\":7,\"Cap\":0}" source=default
time="2021-10-21T12:58:10Z" level=info msg="failed to send process message" backoffSteps=7 err="HTTP request failed: \"500 Internal Server Error\" \"Got an unexpected exception:
KeyError, 'queryMetadata'\"" giveUp=false source=default
So giveUp=true
is logged when we give up retrying. It seems that your step have a retry with 7 steps?
Stale issue message