nomad-sd: Expose service check status as metrics
michael-strigo opened this issue · 4 comments
Proposal
Expose service's health check status via metrics.
Ideally, it would be nice to have a gauge for healthy vs unhealthy allocations of a specific service.
Use-cases
Allow external tools to detect cases in which service went unhealthy.
Hi @michael-strigo and thanks for the suggestion. This seems like a nice idea and so I'll add it to the backlog.
The Nomad Metrics Reference document lists the nomad.nomad.job_summary.running
(aka nomad_nomad_job_summary_running
) metric, as well as a few others that match up with the "Allocation Status" section of the Nomad web UI. However, I agree, it would be nice to have access to the Placed/Desired/Healthy/Unhealthy stats.
I've created a small Prometheus exporter to address this issue for now: https://github.com/strigo/nomad-service-discovery-exporter
agree this seems like an obvious win, also on the consul side