[kube-prometheus-stack] Dashboards do not account for multiple instances of the kube-state-metrics

Question

[kube-prometheus-stack] Dashboards do not account for multiple instances of the kube-state-metrics

dvidben opened this issue 5 months ago · 1 comments

Describe the bug a clear and concise description of what the bug is.

When deploying kube-prometheus-stack with multiple instances of the kube-state-metrics, following dashboards show incorrect summary data (i.e double count if replicas = 2):

Cluster dashboard
CPU Quota: Pods column
Memory: Pods column
Compute Resources / Namespace (Workloads)
CPU Quota: CPU requests and CPU limits columns
Memory Quota: Memory requests and Memory limits columns

Other dashboards can be impacted as well.

What's your helm version?

version.BuildInfo{Version:"v3.12.0", GitCommit:"c9f554d75773799f72ceef38c51210f1842a1dea", GitTreeState:"clean", GoVersion:"go1.20.3"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:30:46Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.9-eks-036c24b", GitCommit:"f75443c988661ca0a6dfa0dc01ea82dd42d31278", GitTreeState:"clean", BuildDate:"2024-04-30T23:54:04Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

kube-prometheus-stack

What's the chart version?

58.2.1

What happened?

The underline bug is caused by this Promql queries impacted by running multiple instances of kube-state-metrics.

In the Cluster dashboard (https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/grafana/dashboards-1.14/k8s-resources-cluster.yaml)
sum(kube_pod_owner{job="kube-state-metrics", cluster=""}) by (namespace)
In the Workload dashboard (https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/grafana/dashboards-1.14/k8s-resources-workloads-namespace.yaml)
kube_pod_container_resource_requests{job="kube-state-metrics", cluster="$cluster", namespace="$namespace", resource="cpu"}

Having multiple replicas of the kube-state-metrics cause these metrics to be reported multiple times (with different instance value)

What you expected to happen?

Metrics to be properly calculated even if kube-state-metrics is replicated.

How to reproduce it?

Install kube-prometheus-stack with modified section for the kube-state-metrics (replicas = 2)

Enter the changed values of values.yaml?

kube-state-metrics:
  replicas: 2
  ...

Enter the command that you execute and failing/misfunctioning.

helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 58.2.1 --valyes values.yaml

Anything else we need to know?

Some other

Suggested Promql to fix this bug:

Cluster dashboard:
count (count (kube_pod_owner{job="kube-state-metrics", cluster="$cluster"}) by (namespace, pod)) by (namespace)

Answer 1 · 2024-06-18T23:00:07.000Z

Related issues: prometheus-operator/kube-prometheus#997