swisstxt/onduty

Improve Alert Notification trigger rule

Closed this issue · 1 comments

In order to not trigger Alert notifications "too early", Onduty should not only rely on the alert counter, but also ensure that at least one of the related Icinga service check(s) have reached an hard state (critical during 3 minutes in a row).

In the example below, no notifications should be sent (even if the alert counter crossed the ONDUTY_ALERT_LIMIT).

image

Attention: (to be studied) there is a potential risk to not report continuously flapping service (having two alert limits could be a solution: hard state limit e.g. >1, soft state limit, e.g. >5).

For the moment, it is recommend to resolve "upstream" this issue, e.g. by triggering an Onduty alert only when the HARD state is reached on the monitoring system. Onduty abstracts this aspect: all reported alerts are considered as fully legit and the onduty "counter logic" is here to add some delay to let a chance to auto heal without alerting the on duty contact.