— Echo —
A completely-local-compute, AI assistant

A plethora of AI tools are currently available.

Many have, at most, rough support for non-NVIDIA GPUs.

This is an effort to collage together a suite of model running programs, and get a voice-to-voice assistant, via voice-to-text, text-to-text, and text-to-voice.

All with an eye towards easier usage of existing non-NVIDIA support.

Current capabilities

End to end, voice-to-voice.
Assistance with getting ROCm drivers and custom builds for whisper.cpp/llama.cpp that support ROCm compatible GPUs
All models are loaded into RAM/VRAM for quick access.

Benchmarks are located here, you are more than welcome to submit yours.

Goals

🏃 Load piper into VRAM for persistence (remove model load time)
⚙️ Setup piper to use AMD GPU (requires custom builds of underlying libs like onnxruntime)
🗣️ More naturalistic responses in the voice output
📝 Implement usage of command functionality from whisper.cpp
💾 Potentially dockerize
🛠️ Fine tuning parameters of various components to optimize processing times
🤖 Bots? Bots.
🪟 Windows implementation

Setup

Prerequisites

First, you (probably) need to be on linux. If you're here, you might already know it's primarily supported on Redhat, SUSE, and Debian. What you might not know is other distros, like Arch, do support it through user repos.

Second, you'll (probably) want ffmpeg for the voice_query.sh script. You don't necessarily need to use ffmpeg, arecord or parec would also work. You just something that will generate 16000hz .wav files from your microphone.

Build & Ship

Kick off the building of the various components with

./setup.sh;

This script:

Makes directories that will need to be manually filled by you with appropriate models
Optionally downloads default models
Pulls in the submodules
Builds the whisper.cpp and llama.cpp models. For llama.cpp you will probably want to either rebuild with clblast flags if your gpu isn't on the rocm compat list. Check here for a comprehensive list of gpus rocm supports. Use the llvm target that you need, and modify the buildAMD.sh script to get that building for your gpu.

1b. Download models for the program to use if you didn't want defaults.

llama.cpp: instructions here >> .gguf goes into llms folder
whisper.cpp: instructions here >> .bin goes into ./whisper.cpp/models folder
piper: instructions here >> .onnx and .onnx.json go into voices folder or for some quick defaults, run

./defaultModels.sh

If you aren't comfortable with locating processes and terminating them manually, then don't run this script. Instead you can run each command in a separate terminal tab and it will also work. Also, make sure to replace model names with the models you downloaded.

individual commands

./whisper.cpp/server -m ./whisper.cpp/models/ggml-large-v3-q5_0.bin --port 6666 --no-timestamps
./llama.cpp/server -m ./llms/capybarahermes-2.5-mistral-7b.Q4_K_M.gguf -ngl 1000 --port 7777
# in a venv in piper/src/python_run
python -m piper.http_server -m ../../../voices/en_US-kusal-medium.onnx

single command

run.sh;

Finally, you can trigger a voice input/output using:

voice_query.sh; paplay ./aiVoice.wav;

The main thing here is you need something that will gen a 16000hz audio input to send through talkV3.sh. I prefer ffmpeg because I can hit q when I'm done.

The final output of this will be a .wav file, and you can trivially use whatever to play the assistant's response.

That's it!

Licensing

whisper.cpp, piper, and llama.cpp are licensed under MIT license.

The Echo mascot image was originally generated with the assistance of DALL·E 3. It was further edited by @JohnnySn0w.

Bugs

currently, I have noticed that if the microphone and the output are hooked to the same interface (like a Scarlett DAC) then there's a cutoff/delay at the beginning of the ai speech output. Not sure what's happening there since Pulse should handle that sort of thing, and Discord works fine.

JohnnySn0w/Echo

— Echo —A completely-local-compute, AI assistant