uktrade/stream-unzip

ZeroDivisionError in the end of zip file

tropicoo opened this issue · 5 comments

Thanks for the lib.
Got ZeroDivisionError: integer division or modulo by zero during processing zip file's last chunk from example code snippet:

Traceback (most recent call last):
  File "***\tmp.py", line 9, in <module>
    for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
  File "***\venv_win32_39py\lib\site-packages\stream_unzip.py", line 180, in stream_unzip
    for _ in yield_all():
  File "***\venv_win32_39py\lib\site-packages\stream_unzip.py", line 35, in _yield_all
    offset = (offset + to_yield) % len(chunk)
ZeroDivisionError: integer division or modulo by zero

Code snippet:

import httpx
from stream_unzip import stream_unzip


def zipped_chunks():
    # Any iterable that yields a zip file
    with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
        yield from r.iter_bytes()


for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
    for chunk in unzipped_chunks:
        # print(chunk)
        print(file_name, file_size)

Python 3.9.5 (Windows 10)
stream-unzip 0.0.23

Thanks for the report! I have reproduced the issue, and am investigating.

I'm not entirely sure that this isn't an issue with httpx... it surprises me it yields any zero-length chunks, even at the end.

I've started a discussion at encode/httpx#1733

In the meantime, you can filter out zero-length chunks with an intermediate generator:

import httpx
from stream_unzip import stream_unzip

def without_zero_length(chunks):
    for chunk in chunks:
        if chunk:
            yield chunk

def zipped_chunks():
    with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
        yield from without_zero_length(r.iter_bytes())

for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
    for chunk in unzipped_chunks:
        print(file_name, file_size)

Ah, or in this case httpx iter_raw also works without filtering out zero length chunks.

import httpx
from stream_unzip import stream_unzip

def zipped_chunks():
    with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
        yield from r.iter_raw()

for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
    for chunk in unzipped_chunks:
        print(file_name, file_size)

Suspect you can only replace iter_bytes with iter_raw if there isn't any additional content encoding on the http response. It would be strange if there were, since the content is a zip file, but you never know...

Ah found a more robust workaround: specifying chunk_size in the call to iter_bytes makes it avoid the zero length chunk that stream-unzip doesn't handle:

import httpx
from stream_unzip import stream_unzip

def zipped_chunks():
    with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
        yield from r.iter_bytes(chunk_size=65536)

for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
    for chunk in unzipped_chunks:
        print(file_name, file_size)

I've opted to not change the code, and instead change the README to have an example that works, and explicitly state that zero length chunks are not supported.

It's a tricky call to make, but all things being equal, I'm happier with the error since it indicates something is unexpected earlier in the processing.

[If users do want to not fail with zero-length chunks, then they can filter them out as in the example above]