image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available

Question

image-gc-high-threshold should be lower than value causing hard eviction nodefs.available or imagefs.available

jhrcz-ls opened this issue a month ago · 7 comments

What happened?

when disk fills, it hits the hard eviction threshold, causing node disk pressure in the same moment imagegc spots it should prune something and start acting. this casues node going into disk pressure and evicting pods and not just start imagegc soon enough

What did you expect to happen?

i expect imagegc spots the filling disk soon enough, to start garbage collection and empty disk before node hits disk pressure.

from current documentation:

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

--image-gc-high-threshold int32     Default: 85
--image-gc-low-threshold int32     Default: 80

and

https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

    "imageGCHighThresholdPercent": 85,
    "imageGCLowThresholdPercent": 80,

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

vs.

nodefs.available<10%
imagefs.available<15%

... this does not make sense having the same percentage for imagefs.available (100-15 = 85 :-)

better approach would be having default values a little bit shifted with eqivalent to setting

  - "--image-gc-high-threshold=80"  
  - "--image-gc-low-threshold=75"

... after setting this, i almost never get node disk pressure, because garbage collection and pruning disk happens soon enough

How can we reproduce it (as minimally and precisely as possible)?

fill disk, spot node disk pressure state at the same moment it starts garbage collecting.

Anything else we need to know?

No response

Kubernetes version

this is not version specific as checked documentation at the moment, its for years the same.

$ kubectl version
# paste output here

Cloud provider

none - kubeadm installation

OS version

not relevant

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

vaibhav2107 commented a month ago

/sig node

Answer 1 · 2024-05-14T12:47:05.000Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.