msg="target not found" for standard kube-state-metrics

Question

msg="target not found" for standard kube-state-metrics

Opened this issue 4 years ago · 3 comments

I am trying to set up the stackdriver-prometheus-sidecar to push a few CronJob/Job metrics from kube-state-metrics to Stackdriver. I'm running into an issue where no matter what I do, all of the metrics report

level=debug ts=2021-04-06T22:10:39.947Z caller=series_cache.go:369 component="Prometheus reader" msg="target not found" labels="{__name__=\"kube_cronjob_next_schedule_time\",container=\"kube-state-metrics\",cronjob=\"cronjob\",endpoint=\"http\",instance=\"10.8.6.2:8080\",job=\"kube-state-metrics\",namespace=\"production\",pod=\"kube-prometheus-stack-kube-state-metrics-bbf56d7f5-dss8c\",service=\"kube-prometheus-stack-kube-state-metrics\"}"

Here is my config for the sidecar:

  - args:
    - --stackdriver.project-id=<project>
    - --prometheus.wal-directory=/prometheus/wal
    - --stackdriver.kubernetes.location=us-central1
    - --stackdriver.kubernetes.cluster-name=<cluster>
    - --include=kube_cronjob_next_schedule_time{namespace="production"}
    - --log.level=debug
    image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.8.2

I am using the Prometheus operator, with Prometheus version 2.18. I tried a couple different versions (up to 2.22) with no luck.

I am not seeing any metrics get to Stackdriver, I've tried adding --stackdriver.store-in-files-directory=/prometheus/sd and see a file get created but nothing is written to it, so it doesn't seem like a permissions issue there.

For the --include flag, I've tried a number of different ways with no luck.

I found #104 which highlights a similar log message but I think that use case is a bit more complex than this

Answer 1 · 2021-04-07T20:07:01.000Z

I dug into the code a bit and determined what the issue is but I'm not sure how it could be fixed given how the code works today.

The issue stems from the target look up, and getting a target from the Cache. We make a call

	t, _ := targetMatch(ts, lset)

that attempts to "return the first target in the entry that matches all labels of the input set iff it has them set." Prometheus targets have a namespace label. For kube-state-metrics deployments, in most cases, this namespace will not be the same as the workloads that it monitors. This leads you to a scenario where targetMatcher is going to iterate over a list of targets that match job and instance labels of the metric and check that all labels match and it fails to match on namespace because kube-state-metrics is not in the same namespace as the workload.

I have fixed this by just deploying kube-state-metrics in my production namespace as that covers my use-case. This is almost certainly not viable for all cases, for example, deploying a workload per namespace would make this tricky as you'd have to deploy multiple kube-state-metrics. Filtering out namespaces from targetMatch seems hacky so I'm hesitant to suggest that.

Answer 2 · 2021-05-18T16:10:34.000Z

I have had the same problem with this sidecar and kube-state-metrics, in my case the only solution I have found is to modify the Prometheus ServiceMonitor (I am using https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/templates )

The serviceMonitor that it generates for the kube-state-metrics metrics scrape takes a literal value for honorLabels of true:
https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/exporters/kube-state-metrics/serviceMonitor.yaml

Changing it to false I get that in the face of the label namespace conflict it generates 2 labels:

namespace: which is kept as the name of the namespace where I have prometheus and kube-state-metrics deployed
exported_namespace: which is the namespace of the object monitored by kube-state-metrics

I have not reviewed all the metrics but I suppose that some will exceed the 10 labels because of this, perhaps in such cases a relabeling can be performed to delete the labels that I do not need.

Answer 3 · 2022-02-08T20:55:11.000Z

Building on @forestoden and @vmcalvo's findings, my recent comment in #229 might be relevant as well.