oetiker/SmokePing

alert not firing loss pattern = >0%,*10*,>0%

Closed this issue · 2 comments

I am not great with perl but have been trying to understand more of the alerting behavior for patterns such as
pattern = >0%,10,>0%
as-documented at
https://oss.oetiker.ch/smokeping/doc/smokeping_config.en.html

Here is a SSH probe target configured and some recent data from our testing:
image
Here is the probe config:
+SSH
binary = /usr/bin/ssh-keyscan
forks = 5
offset = 50%
step = 60
timeout = 5

The following variables can be overridden in each target section

keytype = rsa
pings = 5

and the alert:
+NETENG-L-BACKBONE-TCP-LOSS
type = loss
pattern = >0%,10,>0%
comment = Two failed connections in 10 polls
to = |/etc/smokeping/etc/smokedetector-no-merge
edgetrigger = yes

The target config is just the host name, none of the probe attributes are overridden.

Based on the graph above, we would have expected this to fire (we simulated failure on two polls within 10 minutes - maybe even 3 based on the 1/5 loss shown before one of the complete downs)

We were monitoring the system logs, alert script, and e-mail that should have shown this alert firing, but no such alert event was triggered. The alert does trigger if we have two 100% loss events in a row.

Can someone shed some further light on this alert pattern and why it is not firing for 2-3 loss events within 10 minutes, but does trigger for two complete loss events in a row?

It looks like this did end up firing but a couple minutes later. I will do more testing and either provide clarifications or close this issue today

Non-issue, confirmed after validating our test procedure and testing again.