A plethora of AI tools are currently available.
Many have, at most, rough support for non-NVIDIA GPUs.
This is an effort to collage together a suite of model running programs, and get a voice-to-voice assistant, via voice-to-text, text-to-text, and text-to-voice.
All with an eye towards easier usage of existing non-NVIDIA support.
- End to end, voice-to-voice.
- Assistance with getting ROCm drivers and custom builds for whisper.cpp/llama.cpp that support ROCm compatible GPUs
- All models are loaded into RAM/VRAM for quick access.
Benchmarks are located here, you are more than welcome to submit yours.
- 🏃 Load piper into VRAM for persistence (remove model load time)
- ⚙️ Setup piper to use AMD GPU (requires custom builds of underlying libs like onnxruntime)
- 🗣️ More naturalistic responses in the voice output
- 📝 Implement usage of command functionality from whisper.cpp
- 💾 Potentially dockerize
- 🛠️ Fine tuning parameters of various components to optimize processing times
- 🤖 Bots? Bots.
- 🪟 Windows implementation
First, you (probably) need to be on linux. If you're here, you might already know it's primarily supported on Redhat, SUSE, and Debian. What you might not know is other distros, like Arch, do support it through user repos.
Second, you'll (probably) want ffmpeg
for the voice_query.sh
script. You don't necessarily need to use ffmpeg
, arecord
or parec
would also work. You just something that will generate 16000hz .wav files from your microphone.
- Kick off the building of the various components with
./setup.sh;
This script:
- Makes directories that will need to be manually filled by you with appropriate models
- Optionally downloads default models
- Pulls in the submodules
- Builds the whisper.cpp and llama.cpp models. For llama.cpp you will probably want to either rebuild with clblast flags if your gpu isn't on the rocm compat list. Check here for a comprehensive list of gpus rocm supports. Use the llvm target that you need, and modify the buildAMD.sh script to get that building for your gpu.
1b. Download models for the program to use if you didn't want defaults.
- llama.cpp: instructions here >>
.gguf
goes intollms
folder - whisper.cpp: instructions here >>
.bin
goes into./whisper.cpp/models
folder - piper: instructions here >>
.onnx
and.onnx.json
go intovoices
folder or for some quick defaults, run
./defaultModels.sh
- If you aren't comfortable with locating processes and terminating them manually, then don't run this script. Instead you can run each command in a separate terminal tab and it will also work. Also, make sure to replace model names with the models you downloaded.
individual commands
./whisper.cpp/server -m ./whisper.cpp/models/ggml-large-v3-q5_0.bin --port 6666 --no-timestamps
./llama.cpp/server -m ./llms/capybarahermes-2.5-mistral-7b.Q4_K_M.gguf -ngl 1000 --port 7777
# in a venv in piper/src/python_run
python -m piper.http_server -m ../../../voices/en_US-kusal-medium.onnx
single command
run.sh;
- Finally, you can trigger a voice input/output using:
voice_query.sh; paplay ./aiVoice.wav;
The main thing here is you need something that will gen a 16000hz audio input to send through talkV3.sh
. I prefer ffmpeg because I can hit q when I'm done.
The final output of this will be a .wav file, and you can trivially use whatever to play the assistant's response.
That's it!
whisper.cpp, piper, and llama.cpp are licensed under MIT license.
The Echo mascot image was originally generated with the assistance of DALL·E 3. It was further edited by @JohnnySn0w.
- currently, I have noticed that if the microphone and the output are hooked to the same interface (like a Scarlett DAC) then there's a cutoff/delay at the beginning of the ai speech output. Not sure what's happening there since Pulse should handle that sort of thing, and Discord works fine.