InPlacePodVerticalScaling does not meet the requirement of qosClass being equal to Guaranteed after shrinking the memory

What happened?

What did you expect to happen?

InPlacePodVerticalScaling maintains the same qosclass type of Pod before and after scaling

How can we reproduce it (as minimally and precisely as possible)?

After enabling the InPlacePodVerticalScaling feature, the patch modifies the request and limit of the container's resource to a value smaller than usage

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:17:11Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"darwin/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.2", GitCommit:"4b8e819355d791d96b7e9d9efe4cbafae2311c88", GitTreeState:"clean", BuildDate:"2024-02-14T22:24:00Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

/sig node

i can't reproduce this issue in the same 1.29 version, my pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: test-in-place-vpa-pod
spec:
  containers:
  - name: nginx
    image: nginx:1.15.4
    resizePolicy:
    - resourceName: "cpu"
      restartPolicy: "NotRequired"
    - resourceName: "memory"
      restartPolicy: "NotRequired"
    resources:
      limits:
        cpu: "0.1"
        memory: "100M"
      requests:
        cpu: "0.1"
        memory: "100M"

then patch pod:

kubectl patch pod test-in-place-vpa-pod --patch '{"spec":{"containers":[{"name":"nginx","resources":{"limits":{"cpu":"0.1","memory":"100m"},"requests":{"cpu":"0.1","memory":"50m"}}}]}}'

i got this error:

The Pod "test-in-place-vpa-pod" is invalid:
* spec.containers[0].resources.requests: Invalid value: "100m": must be less than or equal to memory limit of 50m
* metadata: Invalid value: "Guaranteed": Pod QoS is immutable

i can't reproduce this issue in the same 1.29 version, my pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: test-in-place-vpa-pod
spec:
  containers:
  - name: nginx
    image: nginx:1.15.4
    resizePolicy:
    - resourceName: "cpu"
      restartPolicy: "NotRequired"
    - resourceName: "memory"
      restartPolicy: "NotRequired"
    resources:
      limits:
        cpu: "0.1"
        memory: "100M"
      requests:
        cpu: "0.1"
        memory: "100M"

then patch pod:

kubectl patch pod test-in-place-vpa-pod --patch '{"spec":{"containers":[{"name":"nginx","resources":{"limits":{"cpu":"0.1","memory":"100m"},"requests":{"cpu":"0.1","memory":"50m"}}}]}}'

i got this error:

The Pod "test-in-place-vpa-pod" is invalid:
* spec.containers[0].resources.requests: Invalid value: "100m": must be less than or equal to memory limit of 50m
* metadata: Invalid value: "Guaranteed": Pod QoS is immutable

you can test with memory resource.

you can test with memory resource.

could you please provide your patch command

/cc @esotsal

I reproduced this issue on v1.30.

Create a namespace and a pod following the documentation:

$ kubectl create namespace qos-example
$ kubectl create -f https://k8s.io/examples/pods/qos/qos-pod-5.yaml

Update the memory limit and request with a quite low value (1Mi):

$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"1Mi"}, "limits":{"memory":"1Mi"}}}]}}'

Confirm the container status:

$ kubectl -n qos-example get pod qos-demo-5 -o json | jq ".status.containerStatuses[0].resources"
{
  "limits": {
    "cpu": "700m",
    "memory": "200Mi"
  },
  "requests": {
    "cpu": "700m",
    "memory": "1Mi"
  }
}

Then, I noticed the status.resize is "InProgress":

$ kubectl -n qos-example get pod qos-demo-5 -o json | jq ".status.resize"
"InProgress"

Though I'm not familiar with this feature, I guess the runtime is still trying to resize the pod. This issue seems caused because the updated value is too small to be practical.

After this situation, I patched another update with a practical value:

$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"100Mi"}, "limits":{"memory":"100Mi"}}}]}}'
pod/qos-demo-5 patched

Then, the pod was resized with the later patch:

$ kubectl -n qos-example get pod qos-demo-5 -o json | jq ".status.containerStatuses[0].resources"
{
  "limits": {
    "cpu": "700m",
    "memory": "100Mi"
  },
  "requests": {
    "cpu": "700m",
    "memory": "100Mi"
  }
}

This issue can be solved by another patch. So, I don't think this causes a big problem.

Nice catch, tried also with latest K8s , interestingly lower than 14Mi in my tests it failed as well, not sure where the bug is in K8s or outside K8s ( container runtime ).

Definitely worth checking it more deep, thanks for sharing @hshiina , it seems it is a bug somewhere

$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"15Mi"}, "limits":{"memory":"15Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
15728640 <-- success has 15Mi
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"20Mi"}, "limits":{"memory":"20Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
20971520 < -- success has 20Mi
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"15Mi"}, "limits":{"memory":"15Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
15728640 < -- success has 15Mi
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"14Mi"}, "limits":{"memory":"14Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
14680064 < -- success
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"13Mi"}, "limits":{"memory":"13Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
14680064 < -- failed still has 14Mi
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"15Mi"}, "limits":{"memory":"15Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
15728640 < -- success has 15Mi
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"14Mi"}, "limits":{"memory":"14Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
14680064 < -- success has 14 Mi
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"1Gi"}, "limits":{"memory":"1Gi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
1073741824 < -- success has 1Gi
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"13Mi"}, "limits":{"memory":"13Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
1073741824 <-- failed still has 1Gi 
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"12Mi"}, "limits":{"memory":"12Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
1073741824 <-- failed still has 1Gi 
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"15Mi"}, "limits":{"memory":"15Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
15728640 <-- success
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"14.5Mi"}, "limits":{"memory":"14.5Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
15204352  < -- success
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"14.1Mi"}, "limits":{"memory":"14.1Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
14782464 < -- success
$ kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"memory":"13.1Mi"}, "limits":{"memory":"13.1Mi"}}}]}}'
pod/qos-demo-5 patched
$ kubectl exec qos-demo-5 --namespace=qos-example -- cat /sys/fs/cgroup/memory.max
14782464 < -- failed still has 14.1Mi

As I put a comment here, the resizing got failed in kubelet:

kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go

Line 743 in b5bc802

    
           klog.ErrorS(nil, "Aborting attempt to set pod memory limit less than current memory usage", "pod", pod.Name)

/triage accepted

/cc @tallclair @vinaykul

Perhaps root cause is attempt of Pod memory resize to a lower value than currently allocated memory? Perhaps InPlacePodVerticalScaling should handle this corner case either proactively or somehow after ? It will fail anyhow by Linux kernel such an attempt or ?

QOS class is determined by the spec, not status. So the QOS class has not been changed (continues to be Guaranteed in the above case).
Setting memory limits below memory use will result in error response from the CRI. Trying to determine memory use at the time of validation would require taking on significant complexity of bringing stats API into validation path - the juice is not worth the squeeze imho. The expectation is that entity such as VPA will set limits above use with reasonable buffer. Perhaps we can consider logging a message into the event stream to better surface the error. Any takers?