monarchwadia/ragged

Support connection pooling & multiple sources

Closed this issue · 1 comments

Consider the case where the dev wants to connect to Azure OpenAI.

How Azure OpenAI works is, they'll provision a limited number of dedicated LLMs for you. These LLMs need to be queried in different styles such as round-robin, or based on lowest rate limit. In such a case, the connection pool should take a list of Azure OpenAI endpoints and credentials.

For example:

llmProviders = [
  {name: "azure-gpt4-1", type: "openai", url: "...", apiKey: "..."},
  {name: "azure-gpt4-2", type: "openai", url: "...", apiKey: "..."},
  {name: "azure-gpt4-3", type: "openai", url: "...", apiKey: "..."}
]

we need more requirements, but some considerations & questions i'm expecting are:

  • Do we support mixed-use LLMs?
  • How do we report rate limits?
  • What happens to retry logic?
  • Do we support this at a low level, or is this something that is built ON TOP of Ragged, using multiple Ragged instances? I can see it going both ways.
  • Do we support this at all?

This should be possible with custom adapters now. i don't think we should implement this ourselves, it's a niche case that maybe someone in the community will be able to share code for.