brotandgames/ciao

treshhold for consecutive check failure

sebastianfischer opened this issue · 4 comments

Is your feature request related to a problem? Please describe.
We get a lot of "false positive" change alarms, because of short DNS resolution failures ([gettaddrinfo ](getaddrinfo: Try again)), configuration reload etc.
While most people would want to be notified if their service is unreachable for even one second, in our case we would prefer only trigger alarms when a service is offline for longer periods of time.

Considered Solution: Threshhold of x consecutive failures
If we could configure a setting with a threshhold of how many checks must fail before a notification gets out this would help us immensely

Describe alternatives you've considered
For some of the false positives (getaddrinfo) this seams to be related to docker and DNS which we are also looking into: moby/moby#32106
We have also inserted a time window via cron where ciao checks are disabled and we are doing our configuration relaods

Additional context
We use Docker and ciao 1.9.4

While most people would want to be notified if their service is unreachable for even one second, in our case we would prefer only trigger alarms when a service is offline for longer periods of time.

The period of time you can adjust via cron.

While most people would want to be notified if their service is unreachable for even one second, in our case we would prefer only trigger alarms when a service is offline for longer periods of time.

The period of time you can adjust via cron.

Sorry if i may have phrased it wrong: This is not a time period issue, but about consecutive check failures.
E.g. I would like to get alarm only after 3 checks have failed (because the first two might be a fluke).

This is not in the scope of this project.

OK. I can understand that. Thanks for replying and for your work on ciao. 🫶