[Feat]: Have a possiblity to have a `repeat` setting for alert and reachability notifications on Netdata Cloud
hugovalente-pm opened this issue · 3 comments
Problem
At the moment the alert and reachability notifications that are sent from Netdata Cloud don't have available a repeat
option, like the standard alerts on Netdata Agent.
This has now been requested by some of our customers as a critical feature especially while working with Incident Management tools like PagerDuty.
Discord conversation with a user (requesting repeat notifications on reachability) here
Description
There should be a way to define a repeat
value per Space/Room that would allow the resending of unreachable notifications
Importance
Important
Value proposition
- Have a more robust notifications process which are repeated (if defined) when the alert stays active for long enough - to draw attention to teams. @ralphm : It will be good to assess if the repeat functionality is only supported for
critical
alerts? - Have a more robust notifications process which are repeated (if defined) when the node stays unreachable for long enough - to draw attention to teams.
Proposed implementation
Take care of the repeat
settings on alert configurations and have a default repeat timeout at a space level for reachability notifications.
The Discord link doesn't work for me. In general it is better to create a screenshot, to ensure we have a record.
My feeling is that you'd want repeating triggers for specific integrations, not generally. In any case separately configurable, including the repeat interval.
I don't think there should be a difference between critical and warning.
My feeling is that you'd want repeating triggers for specific integrations, not generally. In any case separately configurable, including the repeat interval.
@ralphm : For alerts, the definition is in the Alert and is not a configuration per integration.. We simply want to support the repeat
configuration that exist in the alert config.
For the reachability notifications, I would simply add this as a Space level / Room level config like we have for the reachability timeout
now.