process_start_time_seconds Metric Collected Twice Due to ServiceMonitor Changes in Commit 43f2094
Closed this issue · 1 comments
With the latest update commit 74e445a, the ServiceMonitor
for the kube-apiserver
has changed. This change is causing issues as the process_start_time_seconds
metric is now being collected twice: once from the /metrics
path and once from the /metrics/slis
path. Additionally, it is somewhat confusing to see that the newly added configuration for scraping metrics from the /metrics/slis
path is set to run every 5 seconds, while the /metrics
path is scraped every 30 seconds.
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 5s
path: /metrics/slis
port: https
scheme: https
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
serverName: kubernetes
Debug log from out Prometheus server
ts=2024-09-03T08:51:39.348Z caller=scrape.go:1760 level=debug component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-apiserver/0 target=[https://10.34.28.164:443/metrics](https://10.34.28.164/metrics) msg="Out of order sample" series=process_start_time_seconds
Metrics in the related paths
rgarcia$ curl -s -k -H "Authorization: Bearer $(cat tokenfile-scn)" https://10.34.28.164:443/metrics/slis | grep process_start_time_seconds
# HELP process_start_time_seconds [ALPHA] Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.72534614475e+09
rgarcia$ curl -s -k -H "Authorization: Bearer $(cat tokenfile-scn)" https://10.34.28.164:443/metrics | grep process_start_time_seconds
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.72534615776e+09
It seems like the changes were introduced by this commit 43f2094 from @dgrisonnet.
I would suggest to drop process_start_time_seconds
from the /metrics/slis
path to avoid further issues.
Thank you for reporting, I didn't run into this issue whilst testing locally so I am glad you reported it.
The reasoning behind the different scrape interval is that the /metrics/slis endpoint exposes way less metrics but ones that are important to scrape at high frequency. You can read more about this fairly new endpoint in https://kubernetes.io/docs/reference/instrumentation/slis/.
It appears that the proess_start_time_seconds
was intentionally added to address kubernetes/kubernetes#122520, but looking at it again, it seems that we might've been wrong.
I'll update the ServiceMonitor to drop the metric and follow-up in Kubernetes on whether we should fully drop this metric or not.