rabbitmq/rabbitmq-delayed-message-exchange

Delay interval predictability

brianbarclay opened this issue · 9 comments

We have noticed a general issue with the reliability of delayed messages using the x-delay message header and x-delayed-type exchange. Specifically, as the size of the mnesia database grows, delayed messages become gradually more and more delayed.  The delayed message plugin does not respect the delay times in any predictable way. We have some messages that delay 2000 milliseconds, some that delay 900000 milliseconds (15 minutes), and a variety in between. We are seeing messages come through the system as much as 3 days after being queued with an x-delay header of 900000 This behavior is similar both inside a 2-node production cluster, and in our test environment which consists of only a single node. We are able to work around the issue by deleting the mnesia tables, and allowing it to be recreated by the system.

To how many rows does it grow?

This plugin is often used as if RabbitMQ was a database. It is not. Beyond a certain number of delayed messages the current design is not going to work and this plugin is not a priority for our team at the moment.

Another source of variability is the fact that this plugin relies on Erlang timers. After a certain number of long lived timers in the system they begin to compete for scheduler resources and time drift accumulates.

We do not know how many rows, but we believe it gets to be over a million

I'd not expect this design to support millions of delayed messages. it wasn't created for heavy duty workloads and should not be used as a secondary (leave alone primary) data store for delayed operations.

One day it may be more suitable for those tasks but not today.

Thanks. we very much appreciate your candor

What is the best threshold for production, if I limit the growing of delayed message.

What is the best threshold for production, if I limit the growing of delayed message

This depends on your environment. Please ask questions on the mailing list in the future.

I understood it's not a DB but at the same time there should be options to configure if not a clarification in doc. It'll save lots of time in capacity planning. It's a question on integrity of rabbitmq.

We plan to re-architect this plugin. Delay predictability may or may not improve meaningfully as a result, it is an open ended question right now. The limitations of this plugin are reasonably extensively documented.