Support for Multiple Vision Models with easy Interface

Question

Support for Multiple Vision Models with easy Interface

Opened this issue 5 months ago · 3 comments

HI @haseeb-heaven, first thanks for your great work for tying up all common LLMs with one string. Really appreciateable.

I was looking for Vision AI models. In the documentation, you have written that currently, it supports Vopensource and Vision AI models and Vision APIs, but is there any possibility to add any UI, may be a web page running on localhost to deal with LLms, like prompt writing , image uploading, or a drop-down to switch to LLMs and entering their API keys in text fields?

Answer 1 · 2024-06-25T21:16:11.000Z

Hi
Currently it only support GPT Vision and Google AI Vision models and others are not added yet and about the interface it has only CLI interface but for vision models you need to modify it to have the UI for image upload or chat.
The interface is simple to use for all users but for additionalal features like image upload and chat interface we can create new interface but it will take lots of re-designing the interpreter.py and interpreter_lib.py class

Answer 2 · 2024-06-27T09:47:23.000Z

Thanks for the reply. Any plans to upgrade this with Gemini?
Open-source vision models like LLAVA and GUI support will make CODE-Interpreter a more generic and interesting tool.

Answer 3 · 2024-06-28T06:09:40.000Z

This was supposed to be command line tool only for ease to use but for image we can have different branch for GUI to make it more usable.