BAAI/bge-reranker-base startup error
Closed this issue · 9 comments
System Info
version 0.0.27 in pyproject.toml
Information
- Docker
- The CLI directly via pip
Tasks
- An officially supported command
- My own modifications
Reproduction
rda-reranker:
container_name: rda-reranker
image: michaelf34/infinity:latest
ports:
- "7997:7997"
environment:
HF_HUB_CACHE: /opt/models
restart: unless-stopped
volumes:
- rda-reranker-data:/opt/models
command: ['--model-name-or-path', 'BAAI/bge-reranker-base', '--port', '7997', '--engine', 'optimum']
logging:
driver: "json-file"
options:
max-size: 10m
max-file: "3"
Expected behavior
The --engine optimum
triggers the issue. When run without this option, there is no error.
ERROR: Application startup failed. Exiting.
2024-05-11 18:44:54.540 | WARNING | fastembed.embedding:<module>:7 - DefaultEmbedding, FlagEmbedding, JinaEmbedding are deprecated. Use TextEmbedding instead.
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO 2024-05-11 18:44:54,774 infinity_emb INFO: select_model.py:54
model=`BAAI/bge-reranker-base` selected, using
engine=`optimum` and device=`None`
INFO 2024-05-11 18:44:55,094 infinity_emb INFO: utils_optimum.py:83
Optimized model found at
/root/.cache/huggingface/hub/infinity_onnx/BAAI/bge
-reranker-base/model_optimized.onnx, skipping
optimization
INFO 2024-05-11 18:44:57,227 infinity_emb INFO: Getting select_model.py:77
timings for batch_size=64 and avg tokens per
sentence=5
1.60 ms tokenization
69.35 ms inference
0.02 ms post-processing
70.97 ms total
embeddings/sec: 901.76
2024-05-11 18:44:57.270241422 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/roberta/embeddings/position_embeddings/Gather' Status Message: indices element out of data bounds, idx=514 must be within the inclusive range [-514,513]
0.0.32 fails in the same way.
@andrew-at-rise This is the same issue as #127
Ah. Well then, I guess I need to find a different model? They don't appear to be working on a fix.
No, as you can see, I submitted a PR to huggingface, and fixed their issue.
I'm sorry, I don't understand. I deleted my model cache, downloaded a new version of bge-reranker-base and it fails in the same way.
I'm really confused. I am trying to run BAAI/bge-reranker-base. The thing you fixed (?) is maidalun1020/bce-reranker-base_v1.
Yeah..🤦
What I am saying is: You need to fix BAAI/reranker-base, its optimum model is not compatible with 514 but only 512 encodings.
i would suggest xenova. https://huggingface.co/Xenova/bge-reranker-base/blob/main/config.json, which should work better with 512 encodings. If not, please open a PR on the Model repo. Thanks.
$ docker run --rm -p 7777:7777 michaelf34/infinity:0.0.32 --model-name-or-path Xenova/bge-reranker-base --port 7777 --engine optimum
...
INFO 2024-05-15 19:30:04,601 onnx_model INFO: Model saved onnx_model.py:1182
to
/app/.cache/huggingface/hub/infinity_onnx/Xenova/bge
-reranker-base/model_quantized_optimized.onnx
Configuration saved in /app/.cache/huggingface/hub/infinity_onnx/Xenova/bge-reranker-base/ort_config.json
Optimized model saved at: /app/.cache/huggingface/hub/infinity_onnx/Xenova/bge-reranker-base (external data format: False; saved all tensor to one file: True)
The ONNX file model_quantized_optimized.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
INFO 2024-05-15 19:30:06,476 infinity_emb INFO: Getting select_model.py:77
timings for batch_size=32 and avg tokens per
sentence=5
0.96 ms tokenization
17.78 ms inference
0.02 ms post-processing
18.76 ms total
embeddings/sec: 1705.79
2024-05-15 19:30:06.501370706 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/roberta/embeddings/position_embeddings/Gather' Status Message: indices element out of data bounds, idx=514 must be within the inclusive range [-514,513]
Same issue apparently. :(
I am not even sure cpu with optimum is even going to be viable. I am looking for a way to do fast reranking without using gpu, if possible.
@andrew-at-rise - please just open a PR, get involved and fix things. Really the way to go - Its quite easy. I did this for you, see linked PR below.
Wait until this is merged, or upload your own model on hugging face.
https://huggingface.co/Xenova/bge-reranker-base/discussions/2