netdata/netdata-cloud

[Feat]: Have a possiblity to have a `repeat` setting for alert and reachability notifications on Netdata Cloud

hugovalente-pm opened this issue · 3 comments

Problem

At the moment the alert and reachability notifications that are sent from Netdata Cloud don't have available a repeat option, like the standard alerts on Netdata Agent.
This has now been requested by some of our customers as a critical feature especially while working with Incident Management tools like PagerDuty.

Discord conversation with a user (requesting repeat notifications on reachability) here

Description

There should be a way to define a repeat value per Space/Room that would allow the resending of unreachable notifications

Importance

Important

Value proposition

  1. Have a more robust notifications process which are repeated (if defined) when the alert stays active for long enough - to draw attention to teams. @ralphm : It will be good to assess if the repeat functionality is only supported for critical alerts?
  2. Have a more robust notifications process which are repeated (if defined) when the node stays unreachable for long enough - to draw attention to teams.

Proposed implementation

Take care of the repeat settings on alert configurations and have a default repeat timeout at a space level for reachability notifications.

@car12o @ralphm : I have repurposed this old feature request to support the repeat notifications functionality.

The Discord link doesn't work for me. In general it is better to create a screenshot, to ensure we have a record.

My feeling is that you'd want repeating triggers for specific integrations, not generally. In any case separately configurable, including the repeat interval.

I don't think there should be a difference between critical and warning.

My feeling is that you'd want repeating triggers for specific integrations, not generally. In any case separately configurable, including the repeat interval.

@ralphm : For alerts, the definition is in the Alert and is not a configuration per integration.. We simply want to support the repeat configuration that exist in the alert config.
For the reachability notifications, I would simply add this as a Space level / Room level config like we have for the reachability timeout now.