uswitch/kiam

Helm Chart: Enabling Service Monitors Causes Duplicate Metrics

jhwbarlow opened this issue · 1 comments

Enabling the ServiceMonitors causes each metric to be scraped twice.

This is because to enable the ServiceMonitors, the .Values.agent.prometheus.scrape value must be set to true, see agent here and server here. However, enabling this value is also the trigger to add the ptometheus.io/scrape annotation to the agent service and server service.

Because the Prometheus Operator creates a scrape target for each ServiceMonitor resource, and for each service with a scrape: true annotation, the same metric get scraped twice with different labels. For example:

Time Series Value
kiam_sts_cache_hit_total{app="kiam",component="server",instance="10.2.208.3:9620",job="kiam-server",kubernetes_namespace="iam",prometheus="telemetry/po-prom-prometheus"} 183476013
kiam_sts_cache_hit_total{app="kiam",component="server",instance="10.2.32.3:9620",job="kiam-server",kubernetes_namespace="iam",prometheus="telemetry/po-prom-prometheus"} 197057745
kiam_sts_cache_hit_total{endpoint="metrics",instance="10.2.208.3:9620",job="kiam-server",namespace="iam",pod="kiam-server-rgnp4",prometheus="telemetry/po-prom-prometheus",service="kiam-server"} 183476169
kiam_sts_cache_hit_total{endpoint="metrics",instance="10.2.32.3:9620",job="kiam-server",namespace="iam",pod="kiam-server-7wcqp",prometheus="telemetry/po-prom-prometheus",service="kiam-server"} 197057744

Where the series with the service="kiam-server" label are created from the ServiceMonitor, and the ones without created due to the scrape annotation.

I believe that the creation of the ServiceMonitors should not be dependent on the .Values.agent.prometheus.scrape value. This value should only be used to toggle the addition of the scrape annotations to the services.

The ServiceMonitors must be dependent on .Values.agent.prometheus.scrape as currently they do more than just add the annotation, they primarily setup the metrics endpoint.

The prometheus operator stopped using the scrape annotation on pods in favor of ServiceMonitors as they are more configurable. So if your prometheus is scraping both annotation based and servicemonitors, then either you've added an additionalScrapeConfig to your deployment or are still running an older version.

Ideally, when servicemonitors are enabled, then the service annotations need to not be added.