Using LiteLLM to support more models
HakaishinShwet opened this issue ยท 13 comments
This project is pretty great BUT we need more options to use different LLM's.You don't have to worry about creating a solution which supports 100+ LLM easily as LiteLLM is another foss project which is capable of doing this task for you.
Project LiteLLM link - https://github.com/BerriAI/litellm
Adding LiteLLM will be big win for the project as many will be easily able to use many more LLM easily which everyone wants and project will require 3 major parameters from user like base url,model name,api key that's all and with open ai api general structure it can query and give back result for the query.Many big projects have started adding support for this project in there project to make things advanced in easier way so study it and after that if you have any query you can ask them they are pretty responsive plus if u want to know more about my personal experience of using it with other great projects like flowise then I can tell you that too .
Sounds nice, I agree that this makes sense. However, this would need some amount of refactoring (cost tracking for example needs to be set up differently).
This is a super low priority issue right now. Most LMs would not really be able to achieve good performance on SWE-agent + SWE-bench... so this would be a waste of time right now + be hard to maintain probably?
@ofirpress Not only that, there is another coding assistant that offers LiteLLM support: https://github.com/OpenDevin/
It is also under MIT licence.
@ofirpress Litellm make things easy for both devs and user to test and try 100+ LLM easily. Implementing Litellm right now maybe little challenging but in my opinion will make things much more easy for future rather than implementing later(with low priority mindset) when projects become much more complex.Benefit is for both and that is why many projects are implementing too.
One more thing i would like to add in this is that do not underestimate different closed source model and open source models compared to open ai gpt variants because many have equalized or outperformed them and will keep on outperforming them in future maybe in one domain/segment like just coding or sql(specific task based open source model) or in generalized way so i believe in giving users option to test whatever they want and in easiest way possible so that they can show you more interesting stuff then what u can imagine someday.
Rest i leave this matter to team All the best :-))
add support for LM-studio so we can test with local opensource models this should be pretty easy since it uses the openai library.
Supporting local model with litellm or vllm is much more necessary to boost the project with open sources power.
As a way simpler approach, could you consider exposing OPEN_AI_BASE_URL enviromental variable? (Model might be also required for the calls to different models)
In this way, users could use any OpenAI compatible endpoint to run these models. This would open up compatiblity with backends such as Ollama, Text Generation WebUI, LM Studio, etc.
So, if a user want to use Ollama, they should just type the base_url: http://127.0.0.1:11434/, with any string as API Key, and the chosen model. LM Studio users would use http://127.0.0.1:1234
Note that base_url parameter is supported natively by OpenAI python library, so it probably will not require any extra configuration down the road: https://github.com/openai/openai-python/blob/0470d1baa8ef1f64d0116f3f47683d5bf622cbef/src/openai/_base_client.py#L328
I think LiteLLM might make sense eventually because it would allow us to get rid of a chunk of code with hard-coded values. Right now, every time openAI updates their costs or their models, we'll have to update the config as well. LiteLLM would handle a lot of these things for us and give us more support of other models for free. But it's low priority.
If LiteLLM doesn't do it, projects like https://github.com/AgentOps-AI/tokencost might help with cost estimation
I'd be open to include litellm as long as we don't disrupt the current research that's using GPT4, Claude 2, Claude 3. So a good way to start would be to add it as an alternative in models.py
and then we start thinking if we move more stuff over. Having someone open a PR with a proof of concept of how this would look would also be helpful.
However, at the moment it's also not a big priority, because many of the cheaper models aren't good enough to perform well with SWE-agent.
While I agree on the problem and the motivation for solving it, I think it would be good to do a quick scan of the solution space to see what options are out there. If LiteLLM is indeed the best solution for this problem then that would only confirm that.
I think it's important to support multiple API inference providers, as it allows adapting quickly with new releases. Especially with Llama 3's strong performance and a 405B version on the horizon, we won't know yet which API providers are going to offer it and at what prices (if any).
Or the next model from Cohere, Mistral, Reka or someone else.
It would probably also be good for the robustness of the project to be independent of specific LLMs.