coreos/container-linux-update-operator

operator: expose metrics

lucab opened this issue · 0 comments

lucab commented

update-operator is a long running Go process which supervises cluster-wide complex operations. As such it should expose metrics regarding its status, which can be scraped by Prometheus and alerted upon. Access to such endpoint should be governed by kubernetes RBAC policies.

This is a preliminary list of interesting metric:

  • go runtime stats
  • nodes being managed by CLUO
  • nodes in reboot-needed state
  • nodes in before-reboot state
  • nodes in after-reboot state
  • optional "before" and "after" checks state