[REQUEST] Better Infinity Embeddings support

Question

[REQUEST] Better Infinity Embeddings support

arbi-dev opened this issue 4 months ago · 2 comments

Problem

It's great that Tabby supports the fantastic infinity_emb backend for local embeddings, but there are a couple of missing features:

Infinity_emb is not included in the official docker image -- although it can be built with custom image, ideally it would work out of the box.
Infinity_emb supports loading a reranker in addition to an embedding model. It would be great to support loading both models on the same tabby instance (if I am not mistaken that it's not currently available).

Solution

-rebuild official docker with [infinity_emb]
-support multiple --embedding-model-name values at the same time

Alternatives

No response

Explanation

Tabby, Exllama and Infinity are great options for environments like kubernetes where it's better to use official image builds. also having multiple models on one instance helps with more efficient use of GPU resources.

Thank you every for your great hard work on these projects!

Examples

No response

Additional context

No response

Acknowledgements

I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.

Answer 1 · 2024-10-10T04:08:28.000Z

The docker change shouldn't be a big issue. It's possible to make it so docker pulls extras before compiling. However, re-ranking models are a different story.

Any other infinity-emb model outside of embeddings fall out of scope for TabbyAPI's purposes and will bloat the codebase. If you'd like to use different types of models at that level, I'd suggest using infinity-emb itself. Then, write a program that bridges tabby and infinity-emb which can broadcast to the end user.

Tabby is meant to be a cog in the machine, not the entire machine.