BerriAI/liteLLM-proxy

Info Endpoint in Openai v1/model style

Closed this issue · 3 comments

It would be helpful to provide the information on which model is running.

Hey @michaelfeil - definitely. Can you tell me more about what you're trying to achieve?

I would like to host multiple instances of e.g. tgi and set a API Gateway (litellm) in the center, which can handle e.g. GPT3.5, GPT4 and Mistal-7b under ONE url. To do that, i need some kind of „info“ / config, which can dynamically do this, depending on uptime in k8s.

what does 'which model is running' mean in that context?

E.g. if you can call openai and anthropic via the server, would that mean both models are 'running'?

And how is 'running' different from 'available'?