mozmeao/basket

Figure out the descrepency between donation events recieved and tasks run

Closed this issue · 4 comments

pmac commented

Screen Shot 2020-02-23 at 15 40 04

This is the number of messages we recieved from the MoFo SMS queue for the 24 hours preceding Feb 23rd at 15:41 Eastern.

Screen Shot 2020-02-23 at 15 41 41

And this is the number of just process_donation_event tasks that we ran. Huge mismatch. I don't know yet whether it's real or an issue with the metrics. But based on our issues with SFDC API call volume it seems to be real.

pmac commented

I believe it has to do with this Celery issue celery/celery#3270

People are saying that when a task is retried using the delay that multiple workers will pick up the task to retry, then if that has to retry causes that many more tasks to be retried the next time creating an exponential expansion of the number of tasks run.

Recommendation: It's time to move off of Celery.

pmac commented

It seems the behavior we're seeing is documented:

https://docs.celeryproject.org/en/4.4.0/getting-started/brokers/redis.html#id1

I'm going to try lengthening the visibility_timeout as recommended.

pmac commented

I'm still of the opinion that we should move away from Celery and toward either RQ or Spinach.

pmac commented

This was fixed in #475