operator: expose metrics
lucab opened this issue · 0 comments
lucab commented
update-operator
is a long running Go process which supervises cluster-wide complex operations. As such it should expose metrics regarding its status, which can be scraped by Prometheus and alerted upon. Access to such endpoint should be governed by kubernetes RBAC policies.
This is a preliminary list of interesting metric:
- go runtime stats
- nodes being managed by CLUO
- nodes in
reboot-needed
state - nodes in
before-reboot
state - nodes in
after-reboot
state - optional "before" and "after" checks state