Support connection pooling & multiple sources

Question

Support connection pooling & multiple sources

Closed this issue 3 months ago · 1 comments

Consider the case where the dev wants to connect to Azure OpenAI.

How Azure OpenAI works is, they'll provision a limited number of dedicated LLMs for you. These LLMs need to be queried in different styles such as round-robin, or based on lowest rate limit. In such a case, the connection pool should take a list of Azure OpenAI endpoints and credentials.

For example:

llmProviders = [
  {name: "azure-gpt4-1", type: "openai", url: "...", apiKey: "..."},
  {name: "azure-gpt4-2", type: "openai", url: "...", apiKey: "..."},
  {name: "azure-gpt4-3", type: "openai", url: "...", apiKey: "..."}
]

we need more requirements, but some considerations & questions i'm expecting are:

Do we support mixed-use LLMs?
How do we report rate limits?
What happens to retry logic?
Do we support this at a low level, or is this something that is built ON TOP of Ragged, using multiple Ragged instances? I can see it going both ways.
Do we support this at all?

Answer 1 · 2024-06-03T09:53:23.000Z

This should be possible with custom adapters now. i don't think we should implement this ourselves, it's a niche case that maybe someone in the community will be able to share code for.