Questions about send bulk messages and timeout

Question

Questions about send bulk messages and timeout

yurabysaha opened this issue 4 years ago · 4 comments

I can see that notify_multiple_devices in pyfcm/fcm has timeout for send request and timeout is 5.

I have near 10k registration ids and when I send push to all I can see warning in logs

Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='fcm.googleapis.com', port=443): Read timed out. (read timeout=5)")': /fcm/send

So, my questions:
Why this timeout is 5?
Why this error appears ?
Did all my Users receive push notifications?
As we can see retries here, did it mean that some of my users can be spammed by few same pushes?

Thank in advance!

Answer 1 · 2020-11-19T09:34:11.000Z

Are you running on GAE?

To your questions:

5 seconds was recommended as a reasonable time for a connection to succeed in a server environment to prevent tying up execution flow
the error appears either because FCM is down or there are connectivity issues from your VM to FCM API
I can't tell unless I can see the requests eventually getting successful in your logs
Nope

Answer 2 · 2020-11-19T09:54:29.000Z

@olucurious Hi, I use AWS and ECS.
How it works.
I have a Django app where I create a task for Celery -> Celery run code for send push.

I can see this warnings in my logs

[2020-11-18 19:04:33,119: INFO/MainProcess] Received task: contents.tasks.send_push_task[38226327-ae31-4fbb-a01c-5836898c9688]

[2020-11-18 19:04:38,451: WARNING/ForkPoolWorker-2] Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='fcm.googleapis.com', port=443): Read timed out. (read timeout=5)")': /fcm/send

[2020-11-18 19:05:11,130: INFO/ForkPoolWorker-2] Task contents.tasks.send_push_task[38226327-ae31-4fbb-a01c-5836898c9688] succeeded in 38.00989336697967s: None

By the way I got 2 same pushes on my phone, thats why i thought that it can be send twice.

What I thought.
We send request to FCM but wait only 5 seconds, we didnt get response (because we wait it no more 5 sec, but it not mean that it not finished successful) -> than retries with the same IDs, so we got few pushes, but maybe retries only happen on errors except this one (ReadTimeoutError)?

As you can see we got warning about retry after 5 seconds from start.

By the way, I checked firebase-admin-python (official library) and I can see that they have a 120 seconds for timeout
https://github.com/firebase/firebase-admin-python/blob/master/firebase_admin/messaging.py#L333
https://github.com/firebase/firebase-admin-python/blob/master/firebase_admin/_http_client.py#L35

Answer 3 · 2020-11-28T15:30:16.000Z

@olucurious Ok, I setup timeout to 0.1 second for simulate this bug that I has on my production.
So, I run Celery and try to send push as a task to me.
What I can see as a result:

[2020-11-28 15:18:43,123: INFO/MainProcess] Received task: contents.tasks.send_push_task[138776e3-9af2-488c-ac8e-3994e32ca97b]  
[2020-11-28 15:18:43,717: WARNING/ForkPoolWorker-4] Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='fcm.googleapis.com', port=443): Read timed out. (read timeout=0.1)")': /fcm/send
[2020-11-28 15:18:45,956: WARNING/ForkPoolWorker-4] Retrying (Retry(total=8, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='fcm.googleapis.com', port=443): Read timed out. (read timeout=0.1)")': /fcm/send
[2020-11-28 15:18:50,172: WARNING/ForkPoolWorker-4] Retrying (Retry(total=7, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='fcm.googleapis.com', port=443): Read timed out. (read timeout=0.1)")': /fcm/send
.........

Also, I got more than one push on my device!

I guess it need to be fixed, better way it is increase this timeout.
I checked my logs and can see that this bug appears almost every time when we send push to all users (more than 10k)

Hope, I helped to someone to prevent bug that I got in production.

Answer 4 · 2020-11-28T16:02:16.000Z

Alright, I'll create a new release with the recommended timeout in the official library