hashicorp/nomad

nomad-sd: Expose service check status as metrics

michael-strigo opened this issue · 4 comments

Proposal

Expose service's health check status via metrics.
Ideally, it would be nice to have a gauge for healthy vs unhealthy allocations of a specific service.

Use-cases

Allow external tools to detect cases in which service went unhealthy.

Hi @michael-strigo and thanks for the suggestion. This seems like a nice idea and so I'll add it to the backlog.

The Nomad Metrics Reference document lists the nomad.nomad.job_summary.running (aka nomad_nomad_job_summary_running) metric, as well as a few others that match up with the "Allocation Status" section of the Nomad web UI. However, I agree, it would be nice to have access to the Placed/Desired/Healthy/Unhealthy stats.

I've created a small Prometheus exporter to address this issue for now: https://github.com/strigo/nomad-service-discovery-exporter

agree this seems like an obvious win, also on the consul side