ZeroDivisionError in the end of zip file
tropicoo opened this issue · 5 comments
Thanks for the lib.
Got ZeroDivisionError: integer division or modulo by zero
during processing zip file's last chunk from example code snippet:
Traceback (most recent call last):
File "***\tmp.py", line 9, in <module>
for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
File "***\venv_win32_39py\lib\site-packages\stream_unzip.py", line 180, in stream_unzip
for _ in yield_all():
File "***\venv_win32_39py\lib\site-packages\stream_unzip.py", line 35, in _yield_all
offset = (offset + to_yield) % len(chunk)
ZeroDivisionError: integer division or modulo by zero
Code snippet:
import httpx
from stream_unzip import stream_unzip
def zipped_chunks():
# Any iterable that yields a zip file
with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
yield from r.iter_bytes()
for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
for chunk in unzipped_chunks:
# print(chunk)
print(file_name, file_size)
Python 3.9.5 (Windows 10)
stream-unzip 0.0.23
Thanks for the report! I have reproduced the issue, and am investigating.
I'm not entirely sure that this isn't an issue with httpx... it surprises me it yields any zero-length chunks, even at the end.
I've started a discussion at encode/httpx#1733
In the meantime, you can filter out zero-length chunks with an intermediate generator:
import httpx
from stream_unzip import stream_unzip
def without_zero_length(chunks):
for chunk in chunks:
if chunk:
yield chunk
def zipped_chunks():
with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
yield from without_zero_length(r.iter_bytes())
for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
for chunk in unzipped_chunks:
print(file_name, file_size)
Ah, or in this case httpx iter_raw
also works without filtering out zero length chunks.
import httpx
from stream_unzip import stream_unzip
def zipped_chunks():
with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
yield from r.iter_raw()
for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
for chunk in unzipped_chunks:
print(file_name, file_size)
Suspect you can only replace iter_bytes
with iter_raw
if there isn't any additional content encoding on the http response. It would be strange if there were, since the content is a zip file, but you never know...
Ah found a more robust workaround: specifying chunk_size
in the call to iter_bytes
makes it avoid the zero length chunk that stream-unzip doesn't handle:
import httpx
from stream_unzip import stream_unzip
def zipped_chunks():
with httpx.stream('GET', 'https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.4-essentials_build.zip') as r:
yield from r.iter_bytes(chunk_size=65536)
for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
for chunk in unzipped_chunks:
print(file_name, file_size)
I've opted to not change the code, and instead change the README to have an example that works, and explicitly state that zero length chunks are not supported.
It's a tricky call to make, but all things being equal, I'm happier with the error since it indicates something is unexpected earlier in the processing.
[If users do want to not fail with zero-length chunks, then they can filter them out as in the example above]