rpm-software-management/createrepo_c

createrepo_c zstd compression doesn't fill in the content size, in the frame header. Python API problems.

james-antill opened this issue · 4 comments

createrepo_c zstd compression doesn't fill in the content size, in the frame header. This means that you can't call the python API to decompress in the simple/usable way:

data = zstandard.decompress(zstd_data)

...because you'll get an exception:

  File "/usr/lib64/python3.11/site-packages/zstandard/__init__.py", line 210, in decompress
    return dctx.decompress(data, max_output_size=max_output_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
zstd.ZstdError: could not determine content size in frame header

...the only way to workaround this is to guess at the output size and pass that random number to the decompress API call.

See the documentation on the python API, esp. the 7th paragraph, here: https://python-zstandard.readthedocs.io/en/latest/decompressor.html#zstandard.ZstdDecompressor.decompress

...a simple testcase would be to generate a compressed file and then call zstandard.decompress() from the std. python API.

It kind of seems like a python-zstd issue?

sergey-dryabzhinsky/python-zstd#53 (comment)

Decompression fails where no content size is included in the frame (e.g. streaming)

...

(reply)

Yes, this module is simple and dumb. It never meant to support streaming compression. And I'll keep it this way.

createrepo_c uses streaming, so...

Sidenote, I'll plug that it would be great to get zstd support into the Python standard library.

If it's too hard to fix createrepo_c then flag this is super hard, backlog, or just close it.

Just kind of annoying when the only python API for that compression is very hard to use correctly.

Does it significantly impact performance if you stream to a file and then compress?

Supposedly this works

with zstd.ZstdDecompressor().stream_reader(io.BytesIO(compressed)) as r:
    decompressed = r.read()
assert decompressed == data

indygreg/python-zstandard#150