openwisp/openwisp-notifications

[feature] Batch email notifications to prevent email flooding

Opened this issue · 7 comments

Whenever there are infrastructure issues that cause lots of alerts, email boxes start to refuse receiving alert emails, which raise errors but is also bad because the email address sending the alerts could be flagged as spam.

The solution for this problem is to add the possibility to batch emails in a way that if there are more than X emails to the same user in less than X minutes, the subsequent emails get batched so they don't get sent singularly but rather summarized in a single email sent after the situation cools off (say X minutes after the user is not receiving emails).

A default configuration could be as follows:

  • if more than 1 email has been sent to a specific user in the last 30 minutes
  • subsequent emails to the same user get squashed in a summary, sending is paused for 30 minutes
  • if another email is sent to the user, it will also be added to the summary which is on hold for sending and the timer will be reset (eg: 30 more minutes), this repeats until the timer expires and the summary is finally sent

It's just an idea for now, we can improve it further.

A good example of an open source application which does this is sentry:

Screenshot from 2023-02-07 16-09-16

In our Weekly GSoC Meeting, I and @Dhanus3133 discussed this issue. Dhanus proposed this prototype for implementing this feature.

We brainstormed the following implementation in the meeting:

  1. When send_email_notification is executed
    • store the timestamp of the notification in the cache using user_id as cache key
    • do not send the email to the user
    • do not set emailed=True on the notification
    • schedule a task batch_email_notification to execute after 30 minutes from now
  2. When the send_email_notification is executed again for the same user
    • check if the last notification time is present in the cache, if not go to step 1
    • if the last notification time is less than 30 minutes ago, then return from the function. This means the current notification will be sent in the batch notification email
    • if the last notification time is more than 30 minutes ago, overwrite the time and schedule the batch_email_notification task to execute after 30 minutes from now

batch_email_notification task:

  • this task will loop over all the unsent notification of the user in the last 30 mintues Notification.objects.filter(emailed=False, recipient_id=user_id, created_gte=now()-timedelta(minutes=30))
  • we will create a separate template for sending batch notifications
  • add all the unsent notifications to the email and the email to the user.
  • if another email is sent to the user, it will also be added to the summary which is on hold for sending and the timer will be reset (eg: 30 more minutes), this repeats until the timer expires and the summary is finally sent

@nemesifier this may lead to a user never receiving a notification, if the system manages to generate atleast one notification in the 30 minutes interval.

Maybe, we can implement some kind of backoff? E.g. first batch notification is sent at 30 minutes, second after 60 minutes from the last one and so on. I think, 30 minutes is a good cooldown time for batching. We may not even need to implement the backoff mechanism.

Thinking out loud

We should not include all the notification in the email because it may lead to a very long email. Maybe, we should should show only 10 notifications in the email like in the shared screenshot and provide a URL in the end "view all notifications". We should show the correct number of notifications in the email though.

batch_email_notification task:
this task will loop over all the unsent notification of the user in the last 30 mintues Notification.objects.filter(emailed=False, recipient_id=user_id, created_gte=now()-timedelta(minutes=30))

By this, we will send the emails to all the messages created recently, but we should also validate the user email verified and the organization email preference for the messages.

If these must be even validated then I think of two different approaches we can use.

  1. Validate these details again in tasks before sending an email.
  2. We Can store the Notification IDs in a cache and use it in the celery task so this can avoid querying each notification everytime.

Let me know which one suits good.

I don't think we need to verify email preference again in the task. We are already doing it in the send_notification_email handler. That should be sufficient.

I don't think we need to verify email preference again in the task. We are already doing it in the send_notification_email handler. That should be sufficient.

Doesn't this cause an issue when we have it for two different organizations with different email preferences as we only filter it by recipient_id and not by org.

Doesn't this cause an issue when we have it for two different organizations with different email preferences as we only filter it by recipient_id and not by org.

A really good catch @Dhanus3133 👏🏼

I think, we can store the notification IDs in the cache as you proposed then.