GoogleCloudPlatform/prometheus-engine

Support of Job and CronJob monitoring

Closed this issue · 5 comments

Currently kube job metrics, such as kube_job_status_failed or kube_job_status_succeeded are not made available for monitoring.
List of metrics:
https://github.com/kubernetes/kube-state-metrics/blob/main/docs/metrics/workload/job-metrics.md

You can manually deploy kube-state-metrics and scrape these metrics. Instructions here: https://cloud.google.com/stackdriver/docs/managed-prometheus/exporters/kube_state_metrics

We had to limit the number of kube-state metrics we collect by default so that costs are minimal. That being said, if there are enough +1s on this, we can definitely add a few more metrics, especially if they are not high-volume metrics.

@AndrasSandor let us know if @lyanco suggestion works for you. Closing this issue for now.

Future readers feel free to +1 or reopen this thread if there is demand for this feature.

It would be great to add jobs/cronjobs related metrics to the list. The volume would likely be insignificant. However, I assume, it would need to be explicitly enabled to impact costs.

The alternative options are not appealing:

  • Deploying self-managed kube-state-metrics instead of the managed one does not offer the ability to “honor” reserved labels, like namespace/pod etc, leading to a confusing set of labels (e.g. namespace/exported_namespace, etc) and the need to modify existing rules.
  • Deploying pushgateway and modifying all jobs to push metrics to pushgateway would add complexity and require additional effort

Hey @ksoftirqd - thanks for reaching out.

Deploying self-managed kube-state-metrics instead of the managed one does not offer the ability to “honor” reserved labels, like namespace/pod etc, leading to a confusing set of labels (e.g. namespace/exported_namespace, etc) and the need to modify existing rules.

Actually by specifying a ClusterPodMonitoring like we show in examples/, you should have those labels honored by the kube-state-metrics exporter. Have you tried this?

Hi @pintohutch,

Thank you, this is very helpful! I must have missed this example and it indeed works as expected.

If we cannot add jobs/cronjobs to the list of supported resources, this can be a great alternative to the managed kube-state-metrics exporter.

Still it would be great to see those resources added, as the current solution seems to support the majority of resources and it's possible to toggle their metrics collection in the cluster config.