crate/crate-operator

Add Prometheus infrastructure with basic metrics

MarkusH opened this issue · 0 comments

The de-factor monitoring in Kubernetes works through Prometheus. The CrateDB Kubernetes Operator should collect its own metrics and expose them for Prometheus.

For the beginning, it would be nice to track the following metrics:

  • The total number of clusters deployed
  • The total number of clusters deleted
  • The number of clusters monitored by the operator
  • The total number of times clusters were restarted
  • The total number of times clusters were scaled
  • The total number of times clusters were upgraded

Care must be taken with Kopf's idempotence: the kopf.on.* handlers may fail and will be retried until they succeed, unless they fail permanently. The metrics above should only be updated when the events succeeded. Though there's a case to be made to track failures as well.