[kube-prometheus-stack] Dashboards do not account for multiple instances of the kube-state-metrics
dvidben opened this issue · 1 comments
Describe the bug a clear and concise description of what the bug is.
When deploying kube-prometheus-stack with multiple instances of the kube-state-metrics, following dashboards show incorrect summary data (i.e double count if replicas = 2):
-
Cluster dashboard
CPU Quota: Pods column
Memory: Pods column -
Compute Resources / Namespace (Workloads)
CPU Quota: CPU requests and CPU limits columns
Memory Quota: Memory requests and Memory limits columns
Other dashboards can be impacted as well.
What's your helm version?
version.BuildInfo{Version:"v3.12.0", GitCommit:"c9f554d75773799f72ceef38c51210f1842a1dea", GitTreeState:"clean", GoVersion:"go1.20.3"}
What's your kubectl version?
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:30:46Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"28+", GitVersion:"v1.28.9-eks-036c24b", GitCommit:"f75443c988661ca0a6dfa0dc01ea82dd42d31278", GitTreeState:"clean", BuildDate:"2024-04-30T23:54:04Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}
Which chart?
kube-prometheus-stack
What's the chart version?
58.2.1
What happened?
The underline bug is caused by this Promql queries impacted by running multiple instances of kube-state-metrics.
-
In the Cluster dashboard (https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/grafana/dashboards-1.14/k8s-resources-cluster.yaml)
sum(kube_pod_owner{job="kube-state-metrics", cluster=""}) by (namespace)
-
In the Workload dashboard (https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/grafana/dashboards-1.14/k8s-resources-workloads-namespace.yaml)
kube_pod_container_resource_requests{job="kube-state-metrics", cluster="$cluster", namespace="$namespace", resource="cpu"}
Having multiple replicas of the kube-state-metrics cause these metrics to be reported multiple times (with different instance value)
What you expected to happen?
Metrics to be properly calculated even if kube-state-metrics is replicated.
How to reproduce it?
Install kube-prometheus-stack with modified section for the kube-state-metrics (replicas = 2)
Enter the changed values of values.yaml?
kube-state-metrics:
replicas: 2
...
Enter the command that you execute and failing/misfunctioning.
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 58.2.1 --valyes values.yaml
Anything else we need to know?
Some other
Suggested Promql to fix this bug:
- Cluster dashboard:
count (count (kube_pod_owner{job="kube-state-metrics", cluster="$cluster"}) by (namespace, pod)) by (namespace)
Related issues: prometheus-operator/kube-prometheus#997