Scripts not resilient to gateway restarts
deepthidevaki opened this issue · 7 comments
Here the script finds one gateway
And then it tries to exec into the gateway
But between execution of these two lines, the gateway pod was terminated and a new pod was started to replace it. But the script tried to access the terminated gateway and eventually timeouts, failing the experiment.
Might make sense to use the service to be more resilient. Or use the helper retryUntilSuccess
as we do here
2022-04-21 04:34:21.442 CEST
chaos-worker
An instance where this happened:
"++ kubectl exec zeebe-gateway-c7fdf4f5c-v7mzz -n 0b25276f-1113-4627-9c17-5b867256e62a-zeebe -- zbctl create instance benchmark --insecure"
Debug
2022-04-21 04:34:21.505 CEST
chaos-worker
"error: cannot exec into a container in a completed pod; current phase is Failed"
The pod zeebe-gateway-c7fdf4f5c-v7mzz
was terminated before this time.
Or use the helper
retryUntilSuccess
as we do here
It is already using retryUntilSuccess
. The problem is it is retying to connect to the same terminated gateway.
Yeah because getGateway is not included in the loop.
Why don't we execute zbctl
on the chaos worker? We have the authenticationDetails for the cluster available in the process variables.
Currently, it is independent of where and against what it is executed. Local, helm, cloud/saas etc.