knative-extensions/eventing-kafka-broker

Creation of KafkaSource configmap kafka-source-dispatcher-0 takes really long to by created after restart

joke opened this issue · 1 comments

joke commented

Describe the bug

When using a KafkaSource the configmap kafka-source-dispatcher-0 for the StatefulSet isn't reconciled correctly. Actually it is missing for quite time (couple of minutes to an hour).

As you can see. This seems to happen quite a lot if the pod of the stateful set is restartet.

0s          Normal    Killing                        pod/kafka-source-dispatcher-0                      Stopping container istio-proxy
0s          Normal    Killing                        pod/kafka-source-dispatcher-0                      Stopping container kafka-source-dispatcher
0s          Warning   Unhealthy                      pod/kafka-source-dispatcher-0                      Readiness probe failed: Get "http://100.64.130.230:15020/app-health/kafka-source-dispatcher/readyz": dial tcp 100.64.130.230:15020: connect: connection refused
0s          Warning   Unhealthy                      pod/kafka-source-dispatcher-0                      Readiness probe failed: Get "http://100.64.130.230:15020/app-health/kafka-source-dispatcher/readyz": dial tcp 100.64.130.230:15020: connect: connection refused
0s          Warning   Unhealthy                      pod/kafka-source-dispatcher-0                      Readiness probe failed: Get "http://100.64.130.230:15020/app-health/kafka-source-dispatcher/readyz": dial tcp 100.64.130.230:15020: connect: connection refused
0s          Warning   Unhealthy                      pod/kafka-source-dispatcher-0                      Readiness probe failed: Get "http://100.64.130.230:15020/app-health/kafka-source-dispatcher/readyz": dial tcp 100.64.130.230:15020: connect: connection refused
0s          Warning   RecreatingFailedPod            statefulset/kafka-source-dispatcher                StatefulSet knative-eventing/kafka-source-dispatcher is recreating failed Pod kafka-source-dispatcher-0
0s          Normal    SuccessfulDelete               statefulset/kafka-source-dispatcher                delete Pod kafka-source-dispatcher-0 in StatefulSet kafka-source-dispatcher successful
0s          Warning   RecreatingFailedPod            statefulset/kafka-source-dispatcher                StatefulSet knative-eventing/kafka-source-dispatcher is recreating failed Pod kafka-source-dispatcher-0
0s          Normal    SuccessfulDelete               statefulset/kafka-source-dispatcher                delete Pod kafka-source-dispatcher-0 in StatefulSet kafka-source-dispatcher successful
0s          Warning   RecreatingFailedPod            statefulset/kafka-source-dispatcher                StatefulSet knative-eventing/kafka-source-dispatcher is recreating failed Pod kafka-source-dispatcher-0
0s          Normal    SuccessfulDelete               statefulset/kafka-source-dispatcher                delete Pod kafka-source-dispatcher-0 in StatefulSet kafka-source-dispatcher successful
0s          Warning   RecreatingFailedPod            statefulset/kafka-source-dispatcher                StatefulSet knative-eventing/kafka-source-dispatcher is recreating failed Pod kafka-source-dispatcher-0
0s          Warning   FailedDelete                   statefulset/kafka-source-dispatcher                delete Pod kafka-source-dispatcher-0 in StatefulSet kafka-source-dispatcher failed error: pods "kafka-source-dispatcher-0" not found
0s          Normal    Scheduled                      pod/kafka-source-dispatcher-0                      Successfully assigned knative-eventing/kafka-source-dispatcher-0 to ip-10-191-61-24.eu-central-1.compute.internal
0s          Normal    SuccessfulCreate               statefulset/kafka-source-dispatcher                create Pod kafka-source-dispatcher-0 in StatefulSet kafka-source-dispatcher successful
0s          Warning   FailedMount                    pod/kafka-source-dispatcher-0                      MountVolume.SetUp failed for volume "contract-resources" : configmap "kafka-source-dispatcher-0" not found
0s          Warning   FailedMount                    pod/kafka-source-dispatcher-0                      MountVolume.SetUp failed for volume "contract-resources" : configmap "kafka-source-dispatcher-0" not found
0s          Warning   FailedMount                    pod/kafka-source-dispatcher-0                      MountVolume.SetUp failed for volume "contract-resources" : configmap "kafka-source-dispatcher-0" not found

The configmap is missing and the pod is stuck in initialization:

  Normal   Scheduled    34m                   default-scheduler  Successfully assigned knative-eventing/kafka-source-dispatcher-0 to ip-10-191-61-24.eu-central-1.compute.internal
  Warning  FailedMount  3m50s (x23 over 34m)  kubelet            MountVolume.SetUp failed for volume "contract-resources" : configmap "kafka-source-dispatcher-0" not found

After 35 minutes the configmap got created by the eventing-webhook

{"level":"info","ts":"2024-07-31T08:39:22.535Z","logger":"eventing-webhook","caller":"webhook/admission.go:151","msg":"remote admission controller audit annotations=map[string]string(nil)","commit":"b5528fc","knative.dev/pod":"eventing-webhook-76d8567f75-hz5gn","knative.dev/kind":"/v1, Kind=ConfigMap","knative.dev/namespace":"knative-eventing","knative.dev/name":"kafka-source-dispatcher-0","knative.dev/operation":"CREATE","knative.dev/resource":"/v1, Resource=configmaps","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:knative-eventing:kafka-controller","admissionreview/uid":"790d7496-6507-4a43-a4e3-d6a3cc3544c5","admissionreview/allowed":true,"admissionreview/result":"nil"}

and a little bit later it has been updated correctly

{"level":"info","ts":"2024-07-31T08:41:12.378Z","logger":"eventing-webhook","caller":"webhook/admission.go:151","msg":"remote admission controller audit annotations=map[string]string(nil)","commit":"b5528fc","knative.dev/pod":"eventing-webhook-76d8567f75-hz5gn","knative.dev/kind":"/v1, Kind=ConfigMap","knative.dev/namespace":"knative-eventing","knative.dev/name":"kafka-source-dispatcher-0","knative.dev/operation":"UPDATE","knative.dev/resource":"/v1, Resource=configmaps","knative.dev/subresource":"","knative.dev/userinfo":"system:serviceaccount:knative-eventing:kafka-controller","admissionreview/uid":"889b423f-6e24-494f-a84e-b63caf45e54b","admissionreview/allowed":true,"admissionreview/result":"nil"}

Seems the pod is removing the configmap during termination and the eventing webhook isn't creating a new one.

As a work around specifying terminationGracePeriodSeconds: 0 seems to prevent the pod from deleting the configmap.

The problem might be related to #3995.

Expected behavior
A clear and concise description of what you expected to happen.

To Reproduce

  1. create a kafka source
  2. restart the statefulset
  3. wait

Knative release version

serving v1.13.3
eventing v1.13.3
kafka-broker v1.13.6

Additional context
Add any other context about the problem here such as proposed priority

#4027 will help with getting the statefulset to become ready before it's scaled by the autoscaler