huggingface/text-embeddings-inference
A blazing fast inference solution for text embeddings models
RustApache-2.0
Pinned issues
Issues
- 4
Support jinaai/jina-embeddings-v3
#418 opened by luonist - 0
Using CPU Image without ONNX
#388 opened by ramipellumbi - 1
- 2
Health check endpoint is also locked behind API KEY once API_KEY is set, causing healthchecks to fail
#427 opened by Jason-CKY - 4
Download of BAAI/bge-m3 fails on 1.5 using ONNX
#417 opened by avvertix - 2
Inconsistent Embeddings with SentenceTransformer
#420 opened by lytning98 - 0
- 5
Better HTTP Status Code for Empty Requests
#365 opened by mbbyn - 4
Support Alibaba-NLP/gte-multilingual-base & Alibaba-NLP/gte-multilingual-reranker-base
#366 opened by sigridjineth - 5
- 1
C API/ C Wrapper API
#370 opened by 0110G - 1
how to support a SequenceClassification model
#371 opened by homily707 - 0
/embed_sparse Parameters
#372 opened by dbc-2024 - 3
Local intall failure
#373 opened by plageon - 1
- 1
How do i deploy to vertex ?
#380 opened by pulkitmehtaworkmetacube - 1
Question for released docker image
#383 opened by PeterYang12 - 2
TEI fails for Finetuned JinaAI Embeddings models
#384 opened by StefanRaab - 1
Incomplete tutorial
#395 opened by sleepingcat4 - 1
Support NV-Embed-v2 model
#419 opened by jorgeantonio21 - 2
thread 'tokio-runtime-worker' panicked at /usr/src/backends/src/lib.rs:176:14
#424 opened by jackli0127 - 2
Get opentelemetry trace id from request headers instead of creating a new trace
#374 opened by ptanov - 0
Use Nomic Embed from local weights
#422 opened by sinaayyy - 0
new model lier007/xiaobu-embedding-v2
#423 opened by Mars-1990 - 0
- 2
Feature Request: Multi-GPU inference or the ability to choose a GPU at startup
#411 opened by dangerzone - 1
Support env `HF_ENDPOINT`?
#416 opened by yufeng97 - 0
- 5
Different behavior between SentenceTransformer and TEI when using gte-large-en-v1.5
#358 opened by Smityz - 1
Too many router/tokenizer threads
#404 opened by askervin - 0
- 1
Paged attention optimization for memory efficient continuously batched requests
#391 opened by jorgeantonio21 - 2
- 1
Unsupported model IR version
#355 opened by netw0rkf10w - 0
serving bge-reranker-v2-m3
#408 opened by gree2 - 1
Max Token Mismatch
#406 opened by endomorphosis - 1
unknown flag: --gpus See 'docker run --help'
#397 opened by sswolf - 0
Inconsistency in how different URL paths are handled (in inference endpoints)
#398 opened by MoritzLaurer - 0
dunzhang/stella_en_1.5B_v5 Maximum Token Limit Set to 512 Despite Model Capabilities
#396 opened by taoari - 0
Input validation error: `inputs` must have less than 32000 characters. Given: 67337
#394 opened by ffalkenberg - 0
can support bge-visualized ?
#393 opened by whybeyoung - 2
Adding support for stella_en_400M_v5
#359 opened by netw0rkf10w - 6
- 2
add support bge-reranker-v2.5-gemma-lightweight
#368 opened by ziozzang - 0
`bge-reranker-v2-m3` model throughput benchmark
#392 opened by rere950303 - 3
Request support for Llama Prompt Guard
#354 opened by bluenevus - 1
curl: (56) Recv failure: Connection reset by peer
#387 opened by luyu0816 - 0
TEI failed to serve fine-tuned bge-m3 model
#385 opened by KCFindstr - 0
Support Alibaba-NLP/gte-large-en-v1.5 on CPU/MPS
#375 opened by tmostak - 0