aurelio-labs/semantic-router

Support for Infinity as Encoder

Opened this issue · 5 comments

Running a community project under https://github.com/michaelfeil/infinity - this should help with encoding.

Motivation:

Questions:

  • Does __call__ support async calls?
  • whats the policy for optional imports in this lib?
import asyncio
from infinity_emb import AsyncEmbeddingEngine, EngineArgs
query = "What is the python package infinity_emb?"
docs = ["This is a document not related to the python package infinity_emb, hence...", 
    "Paris is in France!",
    "infinity_emb is a package for sentence embeddings and rerankings using transformer models in Python!"]
engine_args = EngineArgs(model_name_or_path = "BAAI/bge-reranker-base", engine="torch")

engine = AsyncEmbeddingEngine.from_args(engine_args)
async def main(): 
    async with engine:
        ranking, usage = await engine.rerank(query=query, docs=docs)
        print(list(zip(ranking, docs)))
asyncio.run(main())

@jamescalam please add the support of the Infinity Embeddings.

hi @therahulparmar and @michaelfeil — we're able to accept PRs for the infinity encoder, the library itself should be added as an optional dependency (can see huggingface, voyage, etc encoders as examples here)

@jamescalam Does call support async calls?

@michaelfeil not right now, we can add support though - is it required for infinity?

Yeah, the batching happens with multiple async request at once. This is also used when the batch size is larger than what can fit at once.

if there is no async loop running, this is challenging to control from inside of infinity.