Azure/azure-workload-identity

Webhook pods are restarting with probes failure

Skyere opened this issue · 0 comments

Skyere commented

Describe the bug
When under the load, webhook pods are restarting with

  Warning  Unhealthy  23m (x2 over 26m)   kubelet            Readiness probe failed: Get "http://10.160.7.232:9440/readyz": dial tcp 10.160.7.232:9440: connect: connection refused
  Warning  Unhealthy  14m (x6 over 26m)   kubelet            Readiness probe failed: Get "http://10.160.7.232:9440/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    14m (x10 over 26m)  kubelet            Back-off restarting failed container manager in pod azure-wi-webhook-controller-manager-586dc676d-scgpn_kube-system(0e0b518e-530c-43cc-999b-303672b7628d)

There is no issues with a few pods or in waiting time
Steps To Reproduce
Start creating like 10+ at the same time
Expected behavior
Webhook pods are not restarting due to probes failure
Logs
I was able only once to get logs from a contianer

{"level":"debug","timestamp":"2023-10-29T10:58:28.992365Z","logger":"controller-runtime.webhook.webhooks","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/webhook/admission/http.go:96$admission.(*Webhook).ServeHTTP","message":"received request","webhook":"/mutate-v1-pod","UID":"c6e290a4-f532-425a-b0dd-455063c027c8","kind":"/v1, Kind=Pod","resource":{"group":"","version":"v1","resource":"pods"}}
{"level":"debug","timestamp":"2023-10-29T10:58:48.902144Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:60$healthz.(*Handler).serveAggregated","message":"healthz check failed","checker":"readyz","error":"webhook server is not reachable: context deadline exceeded"}
{"level":"info","timestamp":"2023-10-29T10:59:05.496663Z","caller":"/usr/local/go/src/log/log.go:194$log.(*Logger).Output","message":"http: TLS handshake error from 127.0.0.1:54936: EOF"}
{"level":"debug","timestamp":"2023-10-29T10:59:07.900988Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:60$healthz.(*Handler).serveAggregated","message":"healthz check failed","checker":"readyz","error":"webhook server is not reachable: context deadline exceeded"}
{"level":"info","timestamp":"2023-10-29T10:59:01.001668Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:128$healthz.writeStatusesAsText","message":"healthz check failed","statuses":[{}]}
{"level":"info","timestamp":"2023-10-29T10:59:26.902616Z","caller":"/usr/local/go/src/log/log.go:194$log.(*Logger).Output","message":"http: TLS handshake error from 127.0.0.1:54342: read tcp 127.0.0.1:9443->127.0.0.1:54342: i/o timeout"}
{"level":"debug","timestamp":"2023-10-29T10:59:27.003707Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:60$healthz.(*Handler).serveAggregated","message":"healthz check failed","checker":"readyz","error":"webhook server is not reachable: dial tcp :9443: i/o timeout"}
{"level":"info","timestamp":"2023-10-29T10:59:31.905952Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:128$healthz.writeStatusesAsText","message":"healthz check failed","statuses":[{}]}
{"level":"debug","timestamp":"2023-10-29T10:59:38.999699Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:60$healthz.(*Handler).serveAggregated","message":"healthz check failed","checker":"readyz","error":"webhook server is not reachable: dial tcp :9443: i/o timeout"}
{"level":"info","timestamp":"2023-10-29T10:59:47.198457Z","caller":"/usr/local/go/src/log/log.go:194$log.(*Logger).Output","message":"http: TLS handshake error from 127.0.0.1:44212: EOF"}
{"level":"info","timestamp":"2023-10-29T10:59:57.994511Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:128$healthz.writeStatusesAsText","message":"healthz check failed","statuses":[{}]}
{"level":"info","timestamp":"2023-10-29T10:59:49.392236Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:128$healthz.writeStatusesAsText","message":"healthz check failed","statuses":[{}]}
{"level":"debug","timestamp":"2023-10-29T11:00:11.911428Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:60$healthz.(*Handler).serveAggregated","message":"healthz check failed","checker":"readyz","error":"webhook server is not reachable: context deadline exceeded"}
{"level":"info","timestamp":"2023-10-29T11:00:16.896439Z","caller":"/usr/local/go/src/log/log.go:194$log.(*Logger).Output","message":"http: TLS handshake error from 127.0.0.1:49882: EOF"}
{"level":"debug","timestamp":"2023-10-29T11:00:29.811357Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:60$healthz.(*Handler).serveAggregated","message":"healthz check failed","checker":"readyz","error":"webhook server is not reachable: dial tcp :9443: i/o timeout"}
{"level":"info","timestamp":"2023-10-29T11:00:38.304028Z","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/manager/internal.go:581$manager.(*controllerManager).engageStopProcedure.func3","message":"Stopping and waiting for non leader election runnables"}
{"level":"info","timestamp":"2023-10-29T11:00:34.904461Z","logger":"controller-runtime.healthz","caller":"/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/healthz/healthz.go:128$healthz.writeStatusesAsText","message":"healthz check failed","statuses":[{}]}

Environment

  • Kubernetes version (use kubectl version): v1.27.3
  • Cloud provider or hardware configuration: AKS