Terminating and stucked broker pod cannot be deleted by Koperator
bartam1 opened this issue · 0 comments
Describe the bug
I have experienced a few times on a GKE v1.22 Kubernetes cluster that when a Kafka cluster is up for days somehow some of the broker pods are stuck in a terminating state e.g.:
AME READY STATUS RESTARTS AGE NODE
kafka-0-czldf 0/1 Terminating 0 3h36m pool1-12efb0bb-7nxy
NAME STATUS ROLES AGE VERSION
pool1-12efb0bb-6qbg Ready <none> 3h36m v1.22.12-gke.500
The kafka-0-czldf has age 3h36m while the pool1-12efb0bb-6qbg node has same age.
I think the problem is caused by a node down, later when a new one comes back the broker pod is stuck.
In the broker pod I can see terminated containers:
State: Terminated
Reason: Completed
Exit Code: 0
Broker pod deletionTimestamp is "2022-09-25T20:56:10Z"
Because the broker pod is stuck and cannot be deleted the Koperator cannot recreate the broker pod thus the Kafka cluster is not working properly.
Steps to reproduce the issue:
Im going to try to reproduce the issue by hand.
Expected behavior
Koperator checks the broker pods and when there is a terminated container the Koperator deletes the pod and re create it.