albe/node-event-storage

Reconsider making partitions backward-scannable

Closed this issue · 3 comments

albe commented

Currently, partitions can only be scanned forward from a known position (0 without a persisted index), because the data block size is only stored in the document header/prefix.
This makes things simpler in the partition, but strictly requires a consistent index in order to allow random range access.
If the document size would also be appended to the document, the partition would effectively become a double linked list and allow backward scanning from a known position (e.g. file end).

This would allow a couple of optimizations, most notably in order to detect a failed write the index would become irrelevant and hence allow recovery on a partition level (see also #31), because for that we need to scan the partition backwards and find the last finished commit and truncate up to that.
Possibly worth to note that a failed write will likely have an invalid "document size" at the end though, so likely it's not as trivial for this specific use-case.

This is a follow-up to #125

albe commented

Other possible solution for this use-case of finding the last finished write: Add a detectable magic byte sequence to every document header, so that a header is "searchable". Then check if the document for that header is complete or truncate before the header.

albe commented

Note: A magic byte sequence needs to be unique enough, that it will not occur in the message data. As long as that message data is not binary, that should be plausible with a sequence that is in the "realistically not used unicode space" or the ASCII control space (0x00-0x1F, a good candidate would be 0x1E "Record Separator" possibly surrounded by 0x00)

albe commented

The current serialization already appends a "\n" after every document record. This means an unfinished-write can easily be detected by checking if the last character in the partition is "\n". If it isn't - and that is the more problematic case, the previous document ending needs to be found. Again, with current serialization format, that is always matching the regex }\s*\n. However, this sequence of characters might also appear inside a regular document body. So the best solution is probably to adjust the document separator character to "(\0x00\0x00)\0x1E\n".

This would also make the whole partition backward scannable, though not in the most efficient way, as still every byte needs to be read.