michaelfeil/infinity

BAAI/bge-reranker-base startup error

Closed this issue · 9 comments

System Info

version 0.0.27 in pyproject.toml

Information

  • Docker
  • The CLI directly via pip

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  rda-reranker:
    container_name: rda-reranker
    image: michaelf34/infinity:latest
    ports:
      - "7997:7997"
    environment:
      HF_HUB_CACHE: /opt/models
    restart: unless-stopped
    volumes:
      - rda-reranker-data:/opt/models
    command: ['--model-name-or-path', 'BAAI/bge-reranker-base', '--port', '7997', '--engine', 'optimum']
    logging:
      driver: "json-file"
      options:
        max-size: 10m
        max-file: "3"

Expected behavior

The --engine optimum triggers the issue. When run without this option, there is no error.

ERROR:    Application startup failed. Exiting.
2024-05-11 18:44:54.540 | WARNING  | fastembed.embedding:<module>:7 - DefaultEmbedding, FlagEmbedding, JinaEmbedding are deprecated. Use TextEmbedding instead.
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO     2024-05-11 18:44:54,774 infinity_emb INFO:           select_model.py:54
         model=`BAAI/bge-reranker-base` selected, using                         
         engine=`optimum` and device=`None`                                     
INFO     2024-05-11 18:44:55,094 infinity_emb INFO:          utils_optimum.py:83
         Optimized model found at                                               
         /root/.cache/huggingface/hub/infinity_onnx/BAAI/bge                    
         -reranker-base/model_optimized.onnx, skipping                          
         optimization                                                           
INFO     2024-05-11 18:44:57,227 infinity_emb INFO: Getting   select_model.py:77
         timings for batch_size=64 and avg tokens per                           
         sentence=5                                                             
                 1.60     ms tokenization                                       
                 69.35    ms inference                                          
                 0.02     ms post-processing                                    
                 70.97    ms total                                              
         embeddings/sec: 901.76                                                 
2024-05-11 18:44:57.270241422 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/roberta/embeddings/position_embeddings/Gather' Status Message: indices element out of data bounds, idx=514 must be within the inclusive range [-514,513]

0.0.32 fails in the same way.

@andrew-at-rise This is the same issue as #127

Ah. Well then, I guess I need to find a different model? They don't appear to be working on a fix.

No, as you can see, I submitted a PR to huggingface, and fixed their issue.

I'm sorry, I don't understand. I deleted my model cache, downloaded a new version of bge-reranker-base and it fails in the same way.

I'm really confused. I am trying to run BAAI/bge-reranker-base. The thing you fixed (?) is maidalun1020/bce-reranker-base_v1.

Yeah..🤦

What I am saying is: You need to fix BAAI/reranker-base, its optimum model is not compatible with 514 but only 512 encodings.

i would suggest xenova. https://huggingface.co/Xenova/bge-reranker-base/blob/main/config.json, which should work better with 512 encodings. If not, please open a PR on the Model repo. Thanks.

$ docker run --rm -p 7777:7777 michaelf34/infinity:0.0.32 --model-name-or-path Xenova/bge-reranker-base --port 7777 --engine optimum
...
INFO     2024-05-15 19:30:04,601 onnx_model INFO: Model saved onnx_model.py:1182
         to                                                                     
         /app/.cache/huggingface/hub/infinity_onnx/Xenova/bge                   
         -reranker-base/model_quantized_optimized.onnx                          
Configuration saved in /app/.cache/huggingface/hub/infinity_onnx/Xenova/bge-reranker-base/ort_config.json
Optimized model saved at: /app/.cache/huggingface/hub/infinity_onnx/Xenova/bge-reranker-base (external data format: False; saved all tensor to one file: True)
The ONNX file model_quantized_optimized.onnx is not a regular name used in optimum.onnxruntime, the ORTModel might not behave as expected.
INFO     2024-05-15 19:30:06,476 infinity_emb INFO: Getting   select_model.py:77
         timings for batch_size=32 and avg tokens per                           
         sentence=5                                                             
                 0.96     ms tokenization                                       
                 17.78    ms inference                                          
                 0.02     ms post-processing                                    
                 18.76    ms total                                              
         embeddings/sec: 1705.79                                                
2024-05-15 19:30:06.501370706 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/roberta/embeddings/position_embeddings/Gather' Status Message: indices element out of data bounds, idx=514 must be within the inclusive range [-514,513]

Same issue apparently. :(

I am not even sure cpu with optimum is even going to be viable. I am looking for a way to do fast reranking without using gpu, if possible.

@andrew-at-rise - please just open a PR, get involved and fix things. Really the way to go - Its quite easy. I did this for you, see linked PR below.
Wait until this is merged, or upload your own model on hugging face.

https://huggingface.co/Xenova/bge-reranker-base/discussions/2