jpwilliams/remit

Endpoints should not be durable

jacktuck opened this issue · 9 comments

Ack Latency for Persistent Messages

basic.ack for a persistent message routed to a durable queue will be sent after persisting the message to disk. The RabbitMQ message store persists messages to disk in batches after an interval (a few hundred milliseconds) to minimise the number of fsync(2) calls, or when a queue is idle. This means that under a constant load, latency for basic.ack can reach a few hundred milliseconds. To improve throughput, applications are strongly advised to process acknowledgements asynchronously (as a stream) or publish batches of messages and wait for outstanding confirms. The exact API for this varies between client libraries.

https://www.rabbitmq.com/confirms.html

Good find in the docs!

Having them be durable, however, does mean that an endpoint can die and then come back up and respond to messages that haven't yet reached their timeout - it's a nice feature to have on occasion and has saved some of our services from causing larger failures before.

Batching might be the way to go, I suppose? But we'd need some rules for how/when it's done.

Having them be durable, however, does mean that an endpoint can die and then come back up and respond to messages that haven't yet reached their timeout

That is expected IMO. Otherwise there is no difference between endpoint and listener semantics.

Aye so making them non-durable is not an option there then. So batching?

Well my point is that i think endpoints should not be durable. And it's fine that messages can be lost. Retrying would help with endpoint can die and then come back up

Ah you reckon? I guess it makes sense for an endpoint to not back up requests for itself, though then I suppose we'd need something so that requesters weren't waiting for services to come back up (i.e., hitting a dead endpoint should return an error immediately rather than waiting 30 seconds). #43's a discussion for that.

If we did do that, I think I'd like to be able to add durable endpoints as an added option; they are very useful sometimes, but if they have a scary performance hit under load and act a bit differently we can just explain that in an option in the docs and folks can use it if they choose.

That sound sensible?

#45 was an issue regarding the possibility of a "transient" option for endpoints (what we're suggesting is the default here).

Would making it the default be considered a breaking change? It wouldn't actively break things, but it does change how they behave, albeit indirectly.

what we're suggesting is the default here

correct

Would making it the default be considered a breaking change?

I think so

add durable endpoints as an added option; they are very useful sometimes

As long as durable: false is the default i think that is cool.

I'll set up a PR stub that we can use to start working on it.

Looking at this a bit more carefully, it doesn't look like it'd be this that's affecting us.

The docs there mention persistent messages that travel to durable queues, but messages sent via requests and emits are always transient (persistent requires either persistent: true or deliveryMode: 2 in the publishing options).

This may still be an intuitive change, but I don't think it'll have any lasting effect on response times etc.

Source: Underlying amqplib docs - Channel#publish section