Data loss scenarios

Question

Data loss scenarios

kristianmo opened this issue 2 years ago · 3 comments

I am trying to get my head around the scenarios where it is possible to loose messages using Vernemq, as a single node.

Is there a hierarchy/workflow for storage? Ala below.

Receive message -> Inflight is full -> online queue is full -> leveldb
Receive message -> client with clean session set to false is offline -> offline queue is full -> leveldb

The online, offline and inflight queues are in memory, so if the broker is killed, then inflight, online queue, and offline queue are lost?

The way to reduce possible data loss is to play around with the size of these queues? Accepting overhead if possible.

Answer 1 · 2022-11-01T08:30:12.000Z

@kristianmo in theory, a single VerneMQ can only loose messages when the queues overflow. (This is by intention and follows the users settings in max_online_messages and max_offline_messages).

max_inflight is not a queue. Offline and online queue are the same queue, just in different states in the queue state machine. Every queue is exclusively owned by a single consumer.

Queues are backed by LevelDB and on-disk too, based on client settings/QoS. During a broker restart Verne will re-load all the needed messages from disk.

Since you do not report an actual data loss issue but ask for context, I hope this helps with some quick information.

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

Answer 2 · 2022-11-03T11:06:57.000Z

Correct, no issue, and its not an issue if it can happen as long as one understand when and why. I just have to understand where it can happen so that it can be explained internally and to customers.

If I understand a QoS 1 message will be received, put into the queue backed up by LevelDB and then "acked", ensuring it is not lost.

Setting max_online_messages=-1 and max_offline_messages=-1 will eventually buffer messages that can/cant be delivered to disk, limited by disk space. Setting them to a value allows you to tune it roughly to a period of time depending on the message rate.

Answer 3 · 2022-11-03T14:22:58.000Z

If I understand a QoS 1 message will be received, put into the queue backed up by LevelDB and then "acked", ensuring it is not lost.

Yes, kind of. The message store itself is not a queue but stores just messages. Queues have indexes (references to those messages). If no queue needs a specific message anymore it will be deleted from the message store. (reference counting).
There's also some minor performance optimisation, where we avoid the disk in case we can immediately and fully "get rid" of a message. I think it's a 1 second buffer but I'd have to check.

Setting max_online_messages=-1 and max_offline_messages=-1 will eventually buffer messages that can/cant be delivered to disk, limited by disk space. Setting them to a value allows you to tune it roughly to a period of time depending on the message rate

Yes. Note that those are global settings but individually applied to each queue. You'll run out of RAM or disk, whichever comes first.

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.