Failed write corrupts LIFO queue file
ArturGaspar opened this issue · 5 comments
If a write fails when pushing an item to the LIFO queue, some data may be actually written before an exception is raised. When trying to pop, the end of the file is supposed to have the size of the last item, but it actually contains whatever was in the partially written data from previous failed push() calls.
>>> from queuelib import LifoDiskQueue
>>> q = LifoDiskQueue("./small_fs/queue") # Filesystem has less than 100KB available.
>>> for i in range(100):
... q.push(b'a' * 1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "queuelib/queue.py", line 152, in push
self.f.write(string)
IOError: [Errno 28] No space left on device
>>> q.pop()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "queuelib/queue.py", line 162, in pop
self.f.seek(-size-self.SIZE_SIZE, os.SEEK_END)
IOError: [Errno 22] Invalid argument
The error above comes from the value of size
, which is decoded from the last SIZE_SIZE
bytes of the file. When that value is larger than the file itself the seek will fail.
attempt (not finished yet) to fix of error either for LIFO and FIFO
https://github.com/artscrap/queuelib/tree/fixdiskwrite
I’m thinking we could solve this by:
- Replacing every
\
by\\
in the inputstring
while pushing. - Adding
\n
after thesize
in every write. - If upon reading where
\n
should be what’s read is not\n
, find the earliest\n
, and truncate the file contents to that point. - Replacing every
\\
by\
on the string read while popping.
Because of the format change, as well as a potential performance impact of slash conversions, I wonder whether we should change the existing queue class or implement a new one that follows this approach (e.g. RecoverableLifoDiskQueue
).
If we do modify the existing queue class, I wonder how to handle format versioning. For example, should we detect old-format files and upgrade them on disk initialization? Or should we empty such files and start from scratch?
Not sure if this makes sense here or as a separate issue, but I ran into a different corruption FIFO issue: When the drive ran out of space, somehow the info.json
file got truncated. Right now I'm manually recovering it by iterating through the entries up to the entry I know I popped/added.
I am having the same issue. It should at least be documented that the LifoDiskQueue
is not safe against corruption at the moment.