kubelet crash loop panic with SIGSEGV
garymm opened this issue · 6 comments
What happened?
I was running some pods on my node as usual.
At some point (23:41:35.127989 in the log), the kubelet crashed and then went into a crash loop.
Log showing the first crash and several thereafter: kubelet-crash.txt
The errors I see shortly before the crash are:
projected.go:292] Couldn't get configMap default/kube-root-ca.crt: object "default"/"kube-root-ca.crt" not registered
projected.go:198] Error preparing data for projected volume kube-api-access-lwp5v for pod default/run-fluid-wtw5n-99kwf: object "default"/"kube-root-ca.crt" not registered
nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/e6cfc866-a5b2-4e16-9b83-884de8552d45-kube-api-access-lwp5v podName:e6cfc866-a5b2-4e16-9b83-884de8552d45 nodeName:}" failed. No retries permitted until 2024-05-13 23:41:38.8756138 +0000 UTC m=+9176.959811707 (durationBeforeRet
ry 2m2s). Error: MountVolume.SetUp failed for volume "kube-api-access-lwp5v" (UniqueName: "kubernetes.io/projected/e6cfc866-a5b2-4e16-9b83-884de8552d45-kube-api-access-lwp5v") pod "run-fluid-wtw5n-99kwf" (UID: "e6cfc866-a5b2-4e16-9b83-884de8552d45") : object "default"/"kube-root-ca.crt" not registered
Stack trace from the first crash (later stack traces are a bit different):
goroutine 401 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3d005c0?, 0x6d89410})
vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x480e338?})
vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x3d005c0, 0x6d89410})
/usr/local/go/src/runtime/panic.go:884 +0x213
errors.As({0x12, 0x0}, {0x3eac4c0?, 0xc0014e6590?})
/usr/local/go/src/errors/wrap.go:109 +0x215
k8s.io/apimachinery/pkg/util/net.IsConnectionReset(...)
vendor/k8s.io/apimachinery/pkg/util/net/util.go:45
k8s.io/client-go/rest.(*Request).request.func2(0x6d2e8a0?, {0x12, 0x0})
vendor/k8s.io/client-go/rest/request.go:1007 +0x79
k8s.io/client-go/rest.IsRetryableErrorFunc.IsErrorRetryable(...)
vendor/k8s.io/client-go/rest/with_retry.go:43
k8s.io/client-go/rest.(*withRetry).IsNextRetry(0xc001587f40, {0x0?, 0x0?}, 0x0?, 0xc001ef5f00, 0xc001bb1560, {0x12, 0x0}, 0x480e328)
vendor/k8s.io/client-go/rest/with_retry.go:169 +0x170
k8s.io/client-go/rest.(*Request).request.func3(0xc001bb1560, 0xc002913af8, {0x4c44840?, 0xc001587f40?}, 0x0?, 0x0?, 0x39efc40?, {0x12?, 0x0?}, 0x480e328)
vendor/k8s.io/client-go/rest/request.go:1042 +0xba
k8s.io/client-go/rest.(*Request).request(0xc00199f200, {0x4c42e00, 0xc00227b0e0}, 0x2?)
vendor/k8s.io/client-go/rest/request.go:1048 +0x4e5
k8s.io/client-go/rest.(*Request).Do(0xc00199f200, {0x4c42dc8, 0xc000196010})
vendor/k8s.io/client-go/rest/request.go:1063 +0xc9
k8s.io/client-go/kubernetes/typed/core/v1.(*nodes).Get(0xc001db6740, {0x4c42dc8, 0xc000196010}, {0x7ffe6a355c42, 0x1d}, {{{0x0, 0x0}, {0x0, 0x0}}, {0x4c03920, ...}})
vendor/k8s.io/client-go/kubernetes/typed/core/v1/node.go:77 +0x145
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).tryUpdateNodeStatus(0xc000289400, {0x4c42dc8, 0xc000196010}, 0x44ead4?)
pkg/kubelet/kubelet_node_status.go:561 +0xf6
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).updateNodeStatus(0xc000289400, {0x4c42dc8, 0xc000196010})
pkg/kubelet/kubelet_node_status.go:536 +0xfc
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncNodeStatus(0xc000289400)
pkg/kubelet/kubelet_node_status.go:526 +0x105
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc002913f28?)
vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x4c19ae0, 0xc0008f2000}, 0x1, 0xc000180360)
vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x2540be400, 0x3fa47ae147ae147b, 0x0?, 0x0?)
vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89
created by k8s.io/kubernetes/pkg/kubelet.(*Kubelet).Run
pkg/kubelet/kubelet.go:1606 +0x58a
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1a pc=0x47a935]
What did you expect to happen?
Ideally no crash, but at least a useful error message.
How can we reproduce it (as minimally and precisely as possible)?
Really not sure.
Anything else we need to know?
No response
Kubernetes version
Client Version: v1.28.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.6
Cloud provider
Bare metal.
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
kubespray
Container runtime (CRI) and version (if applicable)
containerd
containerd github.com/containerd/containerd v1.7.13 7c3aca7a610df76212171d200ca3811ff6096eb8
Related plugins (CNI, CSI, ...) and versions (if applicable)
CNI = calico v3.26.4
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/sig node
errors.As will panic if target is not a non-nil pointer to either a type that implements error, or to any interface type. As returns false if err is nil. But syscall.Errno
clearly implements error.
kubernetes/staging/src/k8s.io/apimachinery/pkg/util/net/util.go
Lines 44 to 47 in be3af46
If we assume that you're using golang 1.20.13 to build this version, then it would panic there: https://github.com/golang/go/blob/a95136a88cb8a51ede3ec2cdca4cfa3962dcfacd/src/errors/wrap.go#L109
Which golang version did you use to build this Kubernetes version?
According to the changelog, kubernentes v1.28.6 (what I'm using) is built with go 1.21.7
/assign @rphillips
/priority important-soon
@garymm do you have a more succinct reproducer to make it easier to find the issue?
Not really sorry. The only weird thing which may be relevant:
This happened on a control plane node, and I didn't have any taints on my control plane nodes, so it was running normal pods as well as control plane pods. No idea if that's relevant.