Local LLM support in Wingman: Currently more of a "community task" - here's why!
BlankFX1 opened this issue · 10 comments
I'm not a fan of OpenAI and their products and am not willing to use or support them.
Please add support for other, by now also well established, APIs too.
Like Text Generation WebUI ( https://github.com/oobabooga/text-generation-webui ) and
KoboldCPP ( https://github.com/LostRuins/koboldcpp ).
Both of them are able to be run locally on a users System and great at what they do - depending on the users hardware specifications.
Text Generation WebUI recently even implemented an OpenAI compatible API, so it might be pretty easy to implement them.
It would be a blast to see Wingman-AI supporting it soon as I'm eager to start testing it.
It's almost impossible for us to ship a local LLM to users and the setup would be too difficult for a majority of them. Our goal with Wingman AI is to make AI as accessible as possible for a broad audience (of mostly gamers).
We and several community members already played around with local LLMs like Mistral 7b etc. Apart from deployment and setup, most of them are either too demanding in terms of hardware (especially if you're running it alongside demanding games like Star Citizen) or just not capable enough (yet). Wingman needs function calls to be able to execute keypress commands. We also need a good base model for multilingual conversations. Yes, all of this can be done if you have a monster PC and deep technical knowledge but no "normal" user will be able to configure that with what's currently available and it would be impossible for us to support.
The Wingman Core is designed in a way that you CAN plug in local LLMs and we want devs to be able to do it. But it's currently and in the near future not on our (=ShipBit core dev) priority list.
We will, however, keep an eye on the market and test new and promising remote solutions like Google Gemini when it's available. Until then, (have someone) build a custom wingman! It can and has been done before!
I'll leave this issue open for reference.
I still think it should be made possible.
You can clearly just make it as easy as possible for most of the gamers by concentrating on your current road with OpenAI and I'm not asking you to change that.
There is also no need to deploy a local LLM to the user and support it.
Yet, I think you should allow advanced users to exchange the target server Wingman is contacting to a user defined string like https://127.0.0.1:5000
.
Just implement it in Wingmans UI as a experimental feature for advanced users and let them know your support ends here.
As Text-Generation-WebUI recently has become OpenAI-API-compatible there could be a chance there isn't much you would have to change to make it work. They implemented this feature to explicitly allow applications that use ChatGPT, like Wingman, to easily also support it.
As for the rest, it shouldn't be much of your concern. Using a local LLM is the users personal business, including the performance and the multilingualism.
If Mistral 7b was the last Checkpoint you looked into, I highly recommend you to take a fresh look into it.
Quantizations and ExLlamav2 allow LLM-Checkpoints of like 20b to be easily and fast run from GPU.
Using GGUF even allows to run LLMs completely off-GPU, only using RAM and CPU, still reaching very high iterations per second. And as technology quickly progresses, the required hardware specs and performance hits are getting lower and lower.
If you really want
Wingman AI [...] to make AI as accessible as possible for a broad audience (of mostly gamers)
there is basically no way around making it optional(!) to use local LLMs.
Especially if you want to reach out to less entitled gamers, that can't afford OpenAI but have access to a public/shared (not so...) local LLM.
I think there is a misunderstanding. This is already possible! We expose the base_url
param of the OpenAI Python in our config so that you CAN already set it to something like localhost:5000. If you are using a plug in replacement for OpenAI, you can just inject it like this. None of the ones I've seen just supported "everything we do" but maybe it got better. People in our community are already using that mechanism to test local LLMs.
If you want to support some of the more special models and systems, you'd probably have to write your own Custom Wingman - which is also possible with Wingman AI. People are already doing that, too.
I'd suggest you check out our Discord server and get in touch with the other "local LLM" people. All I meant was that we (meaning ShipBit) don't plan on spending development effort on the current LLM solutions. This is a community-based Open Source project, though. If you build something good and think it will meet our expectations and requirements for "official support", let's talk. We probably wouldn't reject a PR that matches these criteria (as we already did with XVASynth as TTS Provider, for example). Even if we don't, you can just run your own fork.
I'm not a fan of OpenAI and their products and am not willing to use or support them. Please add support for other, by now also well established, APIs too. Like Text Generation WebUI ( https://github.com/oobabooga/text-generation-webui ) and KoboldCPP ( https://github.com/LostRuins/koboldcpp ). Both of them are able to be run locally on a users System and great at what they do - depending on the users hardware specifications.
Text Generation WebUI recently even implemented an OpenAI compatible API, so it might be pretty easy to implement them. It would be a blast to see Wingman-AI supporting it soon as I'm eager to start testing it.
Hi there. Is openai style function calling now supported in koboldcpp or textgenwebui? That’s the backbone of what makes wingmanAI special in terms of actually impacting the game and not just chatting. Last time I checked and it wasn’t. I even tried working on it for koboldcpp. That’s a major obstacle.
According to oobabooga/text-generation-webui#4455 it sadly doesn't yet.
I'm not aware of anything related to function calling in koboldcpp either.
There appear to be other issues when trying to use Wingman with a local LLM.
e.g. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa4 in position 4: invalid start byte
in TextGenWebUI after finishing and sending a record in Wingman.
So for the time being, I gave up on this project.
According to oobabooga/text-generation-webui#4455 it sadly doesn't yet. I'm not aware of anything related to function calling in koboldcpp either.
There appear to be other issues when trying to use Wingman with a local LLM. e.g.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa4 in position 4: invalid start byte
in TextGenWebUI after finishing and sending a record in Wingman.So for the time being, I gave up on this project.
This is my high level thinking on how function calling light be implemented in koboldcpp or textgen but I also gave up on it because it’s very difficult to universalize with the many different LLMs people might chose to use, but maybe it will spur further thoughts for you. LostRuins/koboldcpp#585 (comment)
Maybe, for the time being, a more pragmatic solution might be good enough.
Like simply parsing the text output and searching for keyword strings.
E.g. if the LLM outputs something like "Understood. Engaging Landing Gear."
just the mention of Landing Gear
could trigger the corresponding Button Press.
It's not 100% fail safe, but giving the LLM a fitting context like If you receive the order to engage the landing gear, make sure to mention the Landing Gear in your answer.
might help a lot.
Yeah that might work at least for a separate “local wingman”; in kobold or textgen try adding a prompt that includes instructions and the various actions in the base computer wingman in a list and see how various 7B models do with reporting the appropriate action back. If you find a good prompt/model combo that seems pretty reliable let me know! Another idea could be to send every prompt through twice, though that would obviously add 2X latency, first time through just ask the model to select and action, second time through omit the actions and just ask for verbal response.
all-local-wingman-4k.mp4
This is Wingman 1.0.0 RC1 using local STT provider whispercpp and talking to a local Mistral7B derivative running in LM studio (as a stand-in replacement for OpenAI) on my machine. Zero custom code, just configuration...
The only thing it can't do out of the box is AI function calling / AI commands but that's because the model does not support it in the same format as OpenAI does. Did you get any closer? I know there are models and derivatives that support function calling but if they don't use the same tools
format as OpenAI, we'd need some adjustments for different models.
I'll close this as things and our attitude have changed and we support a wide range of local LLMs now.