gardener/hvpa-controller

[BUG] [prometheus-operator/config-reloader] memory.limit_in_bytes: device or resource busy

dansible opened this issue · 5 comments

What happened:
Related to this issue in Gardener: gardener/gardener#2163

When deploying prometheus-operator onto a seed, we noticed that the config-reloader container can fail with this error:

  Warning  Failed                  15m (x3 over 15m)  kubelet, ip-10-242-5-86.ap-southeast-2.compute.internal  Error: failed to start container "prometheus-config-reloader": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:367: setting cgroup config for procHooks process caused \\\"failed to write 5242880 to memory.limit_in_bytes: write /sys/fs/cgroup/memory/kubepods/burstable/poda0ef03a6-6461-4648-b0f3-ef75b1738582/prometheus-config-reloader/memory.limit_in_bytes: device or resource busy\\\"\"": unknown

This is a known issue with prometheus-operator:
helm/charts#11447
prometheus-operator/prometheus-operator#2409
prometheus-operator/prometheus-operator#2424

Anything else we need to know?:

The solution seems to be to increase the memory limit for the config-reloader container to a value of at least25mi

Environment:

  • Gardener version: landscape-live-garden:0.64.0
  • Kubernetes version (use kubectl version): kubectl: v1.18.0, kubernetes: v1.15.11
  • Cloud provider or hardware configuration: AWS
  • Hvpa-controller version/commit ID: v0.2.5

Interesting. This can be closed with gardener/gardener#2164, right?

I would try to fix this in upstream VPA - to add a new flag for minimum memory on all recommendations as well.

I am not sure if I understand. How would that flag be different than this:

// Specifies the minimal amount of resources that will be recommended
// for the container. The default is no minimum.
// +optional
MinAllowed v1.ResourceList `json:"minAllowed,omitempty" protobuf:"bytes,3,rep,name=minAllowed,casttype=ResourceList,castkey=ResourceName"`

You have to set this configuration on all of your containers in the clusters which might or might not be applicable for your cluster (depends on the container runtime). It's better to have this configuration only set at one place.

IMHO, it is unrealistic to expect the same min value to work for all containers. I would vote for setting minAllowed for the individual components. Hence, closing this issue.