seznam/slo-exporter

[ENHANCEMENT] - improve (simplify and document) check of slo-exporter configuration validity

Opened this issue · 0 comments

Problem: configuration is decoupled - slo-exporter configuration provides information about how to classify any given event and how to evaluate its success. Apart from that, SLI threholds itself have to be configured and those are to be configured as a PromQL recording rules, outside of slo-exporter. It may easily happen that an error, typo leads to an unavailability of SLO alerting (slo_class, domain mismatch between data provided by slo-exporter and SLO domain thresholds).

DoD:
documentation explicitly pinpoints this issue and provides list of steps describing how one can easily check that SLO alerting for given domain (a combination of slo_domain, slo_class, slo_type as taken from metrics exposed by an slo-exporter instance) is enabled and in operation.

Proposal:
either:

  • just documentation
  • a documentation together with a script to automate the manual steps