bpemb model download fails
Closed this issue · 5 comments
Describe the bug
When using the new ghcr.io/graal-research/deepparse:0.9.10 docker image (after fixing #231) the following error occurs:
app-1 | 2024-07-08 13:56:01,032; DEBUG: Starting new HTTPS connection (1): bpemb.h-its.org:443
app-1 | 2024-07-08 13:56:01,126; DEBUG: https://bpemb.h-its.org:443 "GET /multi/multi.wiki.bpe.vs100000.model HTTP/11" 404 196
app-1 | ERROR: Traceback (most recent call last):
app-1 | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 677, in lifespan
app-1 | async with self.lifespan_context(app) as maybe_state:
app-1 | File "/usr/local/lib/python3.11/contextlib.py", line 204, in __aenter__
app-1 | return await anext(self.gen)
app-1 | ^^^^^^^^^^^^^^^^^^^^^
app-1 | File "/deepparse/app/app.py", line 31, in lifespan
app-1 | download_models()
app-1 | File "/deepparse/download_tools.py", line 106, in download_models
app-1 | download_model(model_type, saving_cache_path=saving_cache_path)
app-1 | File "/deepparse/download_tools.py", line 130, in download_model
app-1 | BPEmb(
app-1 | File "/usr/local/lib/python3.11/site-packages/bpemb/bpemb.py", line 173, in __init__
app-1 | self.model_file = self._load_file(model_file)
app-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
app-1 | File "/usr/local/lib/python3.11/site-packages/bpemb/bpemb.py", line 228, in _load_file
app-1 | return http_get(file_url, cached_file, ignore_tardir=True)
app-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app-1 | File "/usr/local/lib/python3.11/site-packages/bpemb/util.py", line 48, in http_get
app-1 | headers = http_get_temp(url, temp_file)
app-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
app-1 | File "/usr/local/lib/python3.11/site-packages/bpemb/util.py", line 25, in http_get_temp
app-1 | req.raise_for_status()
app-1 | File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
app-1 | raise HTTPError(http_error_msg, response=self)
app-1 | requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://bpemb.h-its.org/multi/multi.wiki.bpe.vs100000.model
app-1 |
app-1 | ERROR: Application startup failed. Exiting.
app-1 exited with code 3
To Reproduce
Run the provided docker-compose.yml with the ghcr.io/graal-research/deepparse:0.9.10 image.
Expected behavior
The model is downloaded container starts normally.
Desktop (please complete the following information):
- Docker
- Version 0.9.10
I saw that this was attempted to be fixed but it still doesnt point to the correct path https://bpemb.h-its.org/multi/multi/multi.wiki.bpe.vs100000.model
It was fixed; the docker image seems to be the wrong one. It is fixed with 0.9.11.
unfortunately the error still occurs
The fix applied in bpemb_embeddings_model.py with BPEmbBaseURLWrapperBugFix
needs also be applied in download_tools.py#L130
You are right. It is fixed in 0.9.12.