michaelfeil/infinity
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
PythonMIT
Issues
- 1
BUG ERROR: Server stops accepting new requests after _core_batch(self) exceptions
#242 opened by vitteloil - 6
nvidia/NV-Embed-v1
#239 opened by Strive-for-excellence - 1
Deberta v3 not working
#241 opened by Stealthwriter - 2
Multi-Modal Inference / Clip
#147 opened by michaelfeil - 6
API Key Authentication for Michaelfeil Infinity
#207 opened by AjayKarma05 - 9
- 0
[Docs] Add quantization / dtype doc
#150 opened by michaelfeil - 1
deberta support
#235 opened by Stealthwriter - 1
Tensor-parallelism for multi-gpu support
#213 opened by SalomonKisters - 6
ValueError: No onnx files found
#225 opened by netw0rkf10w - 4
Error in offline mode with `trust_remote code`: SFR-Embedding-Mistral and nomic does not work without `einops`
#185 opened by prasannakrish97 - 2
Include `einops` in docker image
#231 opened by chiragjn - 3
- 6
Scores slightly off/get rounded up to 1.0
#203 opened by ruben-vb - 1
API Token
#226 opened by vladimirmujagic - 3
Docker path not in readme
#222 opened by hughesadam87 - 9
BAAI/bge-reranker-base startup error
#218 opened by andrew-at-rise - 2
Update docs based on feeback.
#148 opened by michaelfeil - 1
docker compose with folder with models
#215 opened by shuther - 2
mxbai-rerank-large-v1 starup error
#219 opened by edisonzf2020 - 2
Loading models from local path
#217 opened by vladimirmujagic - 1
The jinaai/jina-embeddings-v2-base-zh model reports an error when importing documents into RAG.
#220 opened by edisonzf2020 - 3
Hanging after first embedding generated on MPS
#206 opened by semoal - 9
Support for instructur/instructor-xl models
#125 opened by BBC-Esq - 4
- 1
Support for Python 3.8 in infinity
#209 opened by BarryRun - 1
Load local model
#204 opened by jmoney - 2
- 6
HF_HOME not respected
#194 opened by WinsonSou - 6
shrink: docker image size by pruning venv
#139 opened by peebles - 5
Move `.detach().cpu()` into `encode_core`, and option to use cuda streams
#155 opened by jobright-jiyuan - 0
Add a TextSplitter in LangChain to share the model of the embedding model
#193 opened by Jimmy-Newtron - 1
Issue templates
#149 opened by michaelfeil - 1
model name is not consistent across endpoints
#178 opened by bufferoverflow - 3
Safetensors or to be sure not to load pickled weights
#164 opened by wllhf - 0
Does this work with re-rankers?
#165 opened by cduk - 6
float16 and other optimizations help?
#159 opened by BBC-Esq - 5
- 1
How to run or access infinity on hf a space?
#161 opened by ffreemt - 1
Love the repo! Wish I could help!
#157 opened by BBC-Esq - 1
benchmarks?
#158 opened by BBC-Esq - 4
- 2
Question: Support for sparse embeddings?
#146 opened by Matheus-Garbelini - 7
Content-Encoding: gzip
#136 opened by andrew-at-rise - 0
Adding mkdocs url
#138 opened by michaelfeil - 0
- 6
"msg":"Input should be a valid list"
#130 opened by fishfree - 4
Reranker model fails to load (maidalun1020/bce-reranker-base_v1) - no max token length is set
#127 opened by Matheus-Garbelini - 1
Support for nomic-ai/nomic-embed-text-v1.5
#123 opened by SupreethRao99 - 1
Asking to truncate to max_length but no maximum length
#121 opened by semoal