Figure out the descrepency between donation events recieved and tasks run
Closed this issue · 4 comments
This is the number of messages we recieved from the MoFo SMS queue for the 24 hours preceding Feb 23rd at 15:41 Eastern.
And this is the number of just process_donation_event
tasks that we ran. Huge mismatch. I don't know yet whether it's real or an issue with the metrics. But based on our issues with SFDC API call volume it seems to be real.
I believe it has to do with this Celery issue celery/celery#3270
People are saying that when a task is retried using the delay
that multiple workers will pick up the task to retry, then if that has to retry causes that many more tasks to be retried the next time creating an exponential expansion of the number of tasks run.
Recommendation: It's time to move off of Celery.
It seems the behavior we're seeing is documented:
https://docs.celeryproject.org/en/4.4.0/getting-started/brokers/redis.html#id1
I'm going to try lengthening the visibility_timeout
as recommended.