nozzlegear/ShopifySharp

LeakyBucket does not release the Semaphore if a request fails or the task is canceled

nozzlegear opened this issue · 0 comments

I've run into an interesting problem in production, where an app was suddenly swamped with thousands of calls to the Product Updated webhook endpoint, all for the same store. The LeakyBucketExecutionPolicy dutifully put all the requests into a leaky bucket, but it just couldn't get through them fast enough before Shopify begin to cancel the requests once they hit that 5 second mark.

The problem: the internal LeakyBucket itself does not wrap the semaphore.WaitAsync in a try/catch block, which means any task cancelation exception thrown here will cause the semaphore to not be released. Additionally, the request will not be removed from the queue.

The effect is that Shopify will most likely retry the webhook immediately, adding another request to the queue. The semaphore sees that the previous request is still there, so the new one needs to wait. This wouldn't be a problem for a low-volume endpoint, eventually this would all shake out and the semaphore would empty itself with a few exception messages about canceled tasks in your logs.

But with a high-volume endpoint where you're receiving hundreds of requests per second, the problem just compounds itself and the requests will sit in the LeakyBucket queue for 10, 15 or 30 minutes before their chance to be processed comes up. At this point, the only thing that can save them is Shopify's exponential backoff, where we wait for Shopify to stop trying the endpoints so quickly (and pray that more events aren't generated in the meantime).

The fix for this is to check whether the cancellation token has been canceled before we try to have the semaphore wait on it. If so, the request should be dequeued and skipped. Additionally, we should wrap the semaphore usage in a try/finally, then release the semaphore and dequeue the request if the request is cancelled.