Linewise iteration over compressed file
dreamflasher opened this issue · 5 comments
With gzip.open() the following is possible:
with gzip.open(filename, 'rb') as f:
for line in f:
How do I achieve this with lz4framed? (And I think this question is general enough that this should be integrated into the library)
There's ByteIO, but this only supports an initial chunk. The question is essentially: How do I stream the chunks from lz4 into a ByteIO so that I can read line-by-line?
See either gzip implementation on how they do it or e.g. something like (just an example, won't debug it myself if doesn't work properly!):
https://gist.github.com/vtermanis/739e34b8a4e4b323932996a288fa1001#file-lz4framed_file_reader_example-py-L126
Oh wow, you are amazing! Thank you for writing this up. It doesn't work properly (stopping to read after some time, also encoding problems), but that's totally okay, I'll look into it tomorrow and try to debug it.
You're welcome - maybe worth looking at exactly how you want/need to read the data and make a more simple specific solution for your application (e.g. using the compress_(begin|update|end)
calls directly).
If you think there's a bug in the Compressor class (not part of example), feel free to raise as another issue.
Ultimately, because I work full-time (we open-sourced this project just so others can use it) and because this module is just supposed to expose the C API, don't expect too much support in terms of adding stuff on top, sorry.
I'll close this issue for now. If you or anyone else come up with something you think is worth adding to this module, feel free to open a pull request!
@dreamflasher - I found a bug in the gist I posted (SEEK_END
for BytesIO
) - it should not stop reading when decompressing anymore.
Amazing, thank you very much!