`PackedObjectReader` does not implement `readline` - fails unpicking!
Opened this issue · 2 comments
In a use case where an output of a calculation (a GeneralData
type of an aiida-workgraph
task) is automatically pickled and used as an input by a subsequent calculation, pickle
fails to unpickle the pickled output due to a missing readline
method on the file handler, which is of type disk-objectstore.utils.PackedObjectReader
.
After chatting with @sphuber, it is understood that due to an optimization step by the disk-objectstore
, the file handler changed from io.BufferedReader
to disk_objectstore.utils.PackedObjectReader
. Unfortunately, as I'm unclear as to when this optimization takes place, I am unable to reproduce the issue at the moment, .
In any case, though disk_objectstore.utils.PackedObjectReader
implements read
, it does not implement readline
, both required by pickle
for unpickling. As such, pickle
fails to unpickle the output.
@sphuber recommends/suggests that disk_objectstore.utils.PackedObjectReader
is revisited to ensure it can be treated on an equal footing as an io.BufferedReader
.
Unfortunately, as I'm unclear as to when this optimization takes place, I am unable to reproduce the issue at the moment, .
The crux is that the bug will only appear when the pickled object is part of a pack. If it is a loose file, the API will return a normal file handle when opening it. However, if it is part of a pack, the PackedObjectReader
is used instead, which doesn't implement readline
.
The following script provides a minimal working example:
#!/usr/bin/env python
import disk_objectstore
import pickle
import tempfile
with tempfile.TemporaryDirectory() as dirpath:
# Initialize a disk objectstore container
container = disk_objectstore.Container(dirpath)
container.init_container()
# Pickle an arbitrary object (a random string in this example) and write directly to a pack file
pickled_container = pickle.dumps('some-string')
if True:
key = container.add_objects_to_pack([pickled_container])[0]
else:
key = container.add_object(pickled_container)
# Retrieve the pickled byte stream and try to unpickle it
with container.get_object_stream(key) as handle:
pickle.load(handle)
If the conditional is True
it writes the pickled object directly to a pack and then the pickle.load
call will except
Traceback (most recent call last):
File "scripts/run_pickle_dos.py", line 21, in <module>
pickle.load(handle)
TypeError: file must have 'read' and 'readline' attributes
If we switch it to False
such that the pickled object is written to a loose file instead, the code will work just fine.
So I think we just need to implement readline
on the PackedObjectReader
class and we should be golden
Thanks @sphuber. I'll open a PR draft soon and give it a go.