OverflowError when scanning certain HF repos
csmizzle opened this issue · 4 comments
Describe the bug
When using the modelscan -hf
cmd from the cli, response.read() in /modelscan/tools/utils.py
throws a OverFlowError. Simple fix here would be to catch and read the file in chunks.
something like
...
return response.read()
except OverflowError:
chunk = 16 * 1024
file_size = response.length
data = bytearray()
while len(data) != file_size:
chunk = response.read(chunk)
if not chunk:
break
data.extend(chunk)
return bytes(data)
...
To Reproduce
Steps to reproduce the behavior:
- Use arguments '-hf'
- With model 'stabilityai/stable-diffusion-xl-base-1.0'
- See error
(modelscan-py3.9) ➜ modelscan git:(main) modelscan -hf stabilityai/stable-diffusion-xl-base-1.0
Exception: signed integer is greater than maximum
Traceback (most recent call last):
File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/cli.py", line 72, in cli
modelscan.scan_huggingface_model(huggingface)
File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/modelscan.py", line 76, in scan_huggingface_model
data = io.BytesIO(_http_get(url))
File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/tools/utils.py", line 110, in _http_get
return _http_get(response.headers["Location"])
File "/Users/csmizzle/Desktop/werk/modelscan/modelscan/tools/utils.py", line 115, in _http_get
return response.read()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 472, in read
s = self._safe_read(self.length)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/http/client.py", line 613, in _safe_read
data = self.fp.read(amt)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
OverflowError: signed integer is greater than maximum
Expected behavior
Stream large file using a chunking technique
Environment (please complete the following information):
- mac OS 13.4.1 (22F82)
- Modelscan Version 0.0.0
happy to take this on.
Above fix leads to this. Magic number check is failing for pytorch models. Working with other scans. Will continue to test.
(modelscan-py3.9) ➜ modelscan git:(43-overflowerror) ✗ modelscan -hf stabilityai/stable-diffusion-xl-base-1.0
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder_2/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_decoder/openvino_model.bin using pytorch model scan
Scanning https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_encoder/openvino_model.bin using pytorch model scan
--- Summary ---
No issues found! 🎉
--- Errors ---
Error 1:
The following error was raised during a pytorch scan:
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder/openvino_model.bin
Error 2:
The following error was raised during a pytorch scan:
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/text_encoder_2/openvino_model.bin
Error 3:
The following error was raised during a pytorch scan:
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/unet/openvino_model.bin
Error 4:
The following error was raised during a pytorch scan:
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_decoder/openvino_model.bin
Error 5:
The following error was raised during a pytorch scan:
Invalid magic number for file https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/vae_encoder/openvino_model.bin
@csmizzle Thanks for raising the issue and detailed error report. There is a PR (#39) under review for fixing HuggingFace (HF) model downloads. It replaces the http
library with the requests
to fetch the models. The requests
library takes care of lot of the issues when downloading HF models including URL escaping, redirects, and overflow. I ran the PR against stabilityai/stable-diffusion-xl-base-1.0
model and there was no overflow issue. However, the invalid magic number seems to be an unrelated problem.
If you are happy with the solution in #39, we can close this issue once the PR is approved and merged. The invalid magic number problem can get its own issue.
@iamfaisalkhan awesome, thanks for the update. nice work in #39. will close this up! magic number error likely goes away with the proposed work in #39. will avoid opening new issue for now.