Status manager does not normalize ephemeral container statuses
hshiina opened this issue · 6 comments
What happened?
EphemeralContainerStatuses
is not normalized in normalizeStatus()
in the status manager:
kubernetes/pkg/kubelet/status/status_manager.go
Lines 1026 to 1043 in cade1dd
This does not seem to cause any user facing problem because EphemeralContainerStatuses
is sorted before passed to the status manager and timestamps are normalized at marshaling to create a patch:
kubernetes/pkg/kubelet/kubelet_pods.go
Line 2336 in cade1dd
However, an unexpected behavior is caused internally because timestamps are not normalized. If the log verbosity is 3 or larger, the following messages are logged periodically in kubelet when an ephemeral container exists:
May 14 20:10:33 kind-control-plane kubelet[720]: I0514 20:10:33.437155 720 status_manager.go:230] "Syncing all statuses"
May 14 20:10:33 kind-control-plane kubelet[720]: I0514 20:10:33.438512 720 status_manager.go:984] "Pod status is inconsistent with cached status for pod, a reconciliation should be triggered" pod="default/ephemeral-demo" statusDiff=<
May 14 20:10:33 kind-control-plane kubelet[720]: &v1.PodStatus{
May 14 20:10:33 kind-control-plane kubelet[720]: ... // 8 identical fields
May 14 20:10:33 kind-control-plane kubelet[720]: PodIPs: {{IP: "10.244.0.5"}},
May 14 20:10:33 kind-control-plane kubelet[720]: StartTime: s"2024-05-14 20:09:21 +0000 UTC",
May 14 20:10:33 kind-control-plane kubelet[720]: - InitContainerStatuses: nil,
May 14 20:10:33 kind-control-plane kubelet[720]: + InitContainerStatuses: []v1.ContainerStatus{},
May 14 20:10:33 kind-control-plane kubelet[720]: ContainerStatuses: {{Name: "ephemeral-demo", State: {Running: &{StartedAt: {Time: s"2024-05-14 20:09:23 +0000 UTC"}}}, Ready: true, Image: "registry.k8s.io/pause:3.1", ...}},
May 14 20:10:33 kind-control-plane kubelet[720]: QOSClass: "BestEffort",
May 14 20:10:33 kind-control-plane kubelet[720]: EphemeralContainerStatuses: []v1.ContainerStatus{
May 14 20:10:33 kind-control-plane kubelet[720]: {
May 14 20:10:33 kind-control-plane kubelet[720]: Name: "debugger-qk87g",
May 14 20:10:33 kind-control-plane kubelet[720]: State: v1.ContainerState{
May 14 20:10:33 kind-control-plane kubelet[720]: Waiting: nil,
May 14 20:10:33 kind-control-plane kubelet[720]: - Running: &v1.ContainerStateRunning{StartedAt: v1.Time{Time: s"2024-05-14 20:09:35 +0000 UTC"}},
May 14 20:10:33 kind-control-plane kubelet[720]: + Running: &v1.ContainerStateRunning{StartedAt: v1.Time{Time: s"2024-05-14 20:09:35.697513091 +0000 UTC"}},
May 14 20:10:33 kind-control-plane kubelet[720]: Terminated: nil,
May 14 20:10:33 kind-control-plane kubelet[720]: },
May 14 20:10:33 kind-control-plane kubelet[720]: LastTerminationState: {},
May 14 20:10:33 kind-control-plane kubelet[720]: Ready: false,
May 14 20:10:33 kind-control-plane kubelet[720]: ... // 7 identical fields
May 14 20:10:33 kind-control-plane kubelet[720]: },
May 14 20:10:33 kind-control-plane kubelet[720]: },
May 14 20:10:33 kind-control-plane kubelet[720]: Resize: "",
May 14 20:10:33 kind-control-plane kubelet[720]: ResourceClaimStatuses: nil,
May 14 20:10:33 kind-control-plane kubelet[720]: }
May 14 20:10:33 kind-control-plane kubelet[720]: >
This message is emitted here:
kubernetes/pkg/kubelet/status/status_manager.go
Lines 984 to 988 in cade1dd
After needsReconcile()
returns true
, syncPod()
is called. This does not look so harmful because unchaged
gets true
eventually:
kubernetes/pkg/kubelet/status/status_manager.go
Lines 873 to 881 in cade1dd
API is called once unnecessarily in syncPod()
, though:
What did you expect to happen?
It might be better to normalize ephemeral container statuses, at least timestamps, in the status manager to avoid this unexpected behavior.
How can we reproduce it (as minimally and precisely as possible)?
Set the log verbosity of kubelet to 3 or larger. Then, create an ephemeral container like this.
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
# paste output here
Client Version: v1.29.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/sig node