liske/needrestart

prometheus metrics output

Closed this issue · 3 comments

Hi!

We're (possibly) transitioning away from Icinga to Prometheus for our monitoring down here and it would be quite nice to have the equivalent functionality to Icinga, but as an OpenMetrics endpoint.

I am not exactly sure what the metrics would be like. It seems to me there could be different metrics for kernel, ucode, and services, possibly with a separation between user and system services. So something like this, maybe:

# HELP needrestart_timestamp information about the running version and when it was last updated
# TYPE needrestart_timestamp gauge
needrestart_timestamp{version=3.6} 1700675409
# HELP needrestart_kernel_info information about the kernel
# TYPE needrestart_kernel_info info
needrestart_kernel_info{running=6.5.0-1-amd64,expected=6.5.0-1-amd64,status="current"} 1
# HELP needrestart_ucode_info information about the CPU microcode
# TYPE needrestart_ucode_info info
needrestart_ucode_info{running=0x042c,expected=0x042c,status="current"} 1
# HELP needrestart_services_count number of services requiring a restart
# TYPE needrestart_services_count gauge
needrestart_services_count = 3

It would probably need gauges for containers and sessions too...

Would people here be open to this idea?

Note that there's some overlap between this and the node exporter's support for such thing. This was requested in prometheus/node_exporter#625 but actually implemented in the "collectors" project. It only tracks the reboot-required file, however...

I'm open for changes required for a metrics endpoint. 👍

I just came across this project that might be a good solution: https://git.fsmpi.rwth-aachen.de/thomas/needrestart2prom/-/tree/main