msg="target not found" for standard kube-state-metrics
Opened this issue · 3 comments
I am trying to set up the stackdriver-prometheus-sidecar to push a few CronJob/Job metrics from kube-state-metrics
to Stackdriver. I'm running into an issue where no matter what I do, all of the metrics report
level=debug ts=2021-04-06T22:10:39.947Z caller=series_cache.go:369 component="Prometheus reader" msg="target not found" labels="{__name__=\"kube_cronjob_next_schedule_time\",container=\"kube-state-metrics\",cronjob=\"cronjob\",endpoint=\"http\",instance=\"10.8.6.2:8080\",job=\"kube-state-metrics\",namespace=\"production\",pod=\"kube-prometheus-stack-kube-state-metrics-bbf56d7f5-dss8c\",service=\"kube-prometheus-stack-kube-state-metrics\"}"
Here is my config for the sidecar:
- args:
- --stackdriver.project-id=<project>
- --prometheus.wal-directory=/prometheus/wal
- --stackdriver.kubernetes.location=us-central1
- --stackdriver.kubernetes.cluster-name=<cluster>
- --include=kube_cronjob_next_schedule_time{namespace="production"}
- --log.level=debug
image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.8.2
I am using the Prometheus operator, with Prometheus version 2.18. I tried a couple different versions (up to 2.22) with no luck.
I am not seeing any metrics get to Stackdriver, I've tried adding --stackdriver.store-in-files-directory=/prometheus/sd
and see a file get created but nothing is written to it, so it doesn't seem like a permissions issue there.
For the --include
flag, I've tried a number of different ways with no luck.
I found #104 which highlights a similar log message but I think that use case is a bit more complex than this
I dug into the code a bit and determined what the issue is but I'm not sure how it could be fixed given how the code works today.
The issue stems from the target look up, and getting a target from the Cache. We make a call
t, _ := targetMatch(ts, lset)
that attempts to "return the first target in the entry that matches all labels of the input set iff it has them set." Prometheus targets have a namespace
label. For kube-state-metrics
deployments, in most cases, this namespace will not be the same as the workloads that it monitors. This leads you to a scenario where targetMatcher
is going to iterate over a list of targets that match job
and instance
labels of the metric and check that all labels match and it fails to match on namespace because kube-state-metrics
is not in the same namespace as the workload.
I have fixed this by just deploying kube-state-metrics
in my production
namespace as that covers my use-case. This is almost certainly not viable for all cases, for example, deploying a workload per namespace would make this tricky as you'd have to deploy multiple kube-state-metrics
. Filtering out namespaces from targetMatch
seems hacky so I'm hesitant to suggest that.
I have had the same problem with this sidecar and kube-state-metrics, in my case the only solution I have found is to modify the Prometheus ServiceMonitor (I am using https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/templates )
The serviceMonitor that it generates for the kube-state-metrics metrics scrape takes a literal value for honorLabels of true:
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/exporters/kube-state-metrics/serviceMonitor.yaml
Changing it to false I get that in the face of the label namespace conflict it generates 2 labels:
- namespace: which is kept as the name of the namespace where I have prometheus and kube-state-metrics deployed
- exported_namespace: which is the namespace of the object monitored by kube-state-metrics
I have not reviewed all the metrics but I suppose that some will exceed the 10 labels because of this, perhaps in such cases a relabeling can be performed to delete the labels that I do not need.
Building on @forestoden and @vmcalvo's findings, my recent comment in #229 might be relevant as well.