jondot/sneakers

graceful stop will not wait for consumers to finish processing in-progress deliveries

Closed this issue · 5 comments

Hi,
I am using sneakers when rails. I am running into an issue where for some reason when the gem looses connection with the rabbit server it will just crash, however I am putting that on the back burner right now since that specific rabbit server needs to be rebuilt anyways.
So I am trying to put in a bandaid and found a somewhat major issue.
I am trying to kill the worker like this:

kill cat /var/www/my-app/tmp/pids/sneakers.pid

The logs show me:

2017-09-28T12:22:03Z p-12828 t-1g475w INFO: Received graceful stop
2017-09-28T12:22:03Z p-12834 t-1a15bc INFO: Shutting down workers
2017-09-28T12:22:04Z p-12828 t-bfuig INFO: Worker 0 finished with status 0

But looking at the output of the worker that was last doing its thing, its clear that it is not actually stopping gracefully. The last message it was processing was just stopped and it is back on the queue.

While I can account for this happening in my code (which I do anyways), it is far from ideal when my bandaid was going to be a cronjob that just restarts all the workers every hour or so while I work on a more long term solution.

I am not sure what configuration I can post that would be helpful here, I am currently using sneakers 2.3.0

@jonheckman can we define "gracefully" first, please? It's such a wondefully overused term.

What was introduced in #240 as a "graceful shutdown" makes sure workers do not begin processing any new deliveries when they stop.

Knowing when all consumers are done is not really Sneakers' concern as it cannot possible know that. Only Bunny has the necessary context.

In modern Bunny versions, Bunny::ConsumerWorkPool#shutdown has an option that waits for currently enqueued units of work to be processed. It is false by default (not sure why) and #240 naively relies on that default.

So looks like Sneakers should use shutdown(true) and that would be sufficient. Feel free to investigate if that's the case and submit a PR.

To me graceful (this is generally what I have seen) is allowing current child processes to finish what they are doing. Similar to a graceful stop/restart of nginx or apache, in which both don't drop active requests on the floor.
It actually looks like this may have been partially fixed in 2.6.0 of Bunny ruby-amqp/bunny#437 , however that is just with a timeout. I am on 2.2.0
I am going to try updating my gem versions and see what happens

@jonheckman if your consumers are idempotent, it's not necessarily an issue because unacknowledged deliveries will be requeued by RabbitMQ. Anyhow, please see my explanation above. Upgrading Bunny alone isn't guaranteed to be sufficient because Sneakers uses Bunny's semi-public API method(s).

I believe ruby-amqp/bunny#437 largely addresses this.