/voicechat2

Local SRT/LLM/TTS Voicechat

Primary LanguagePythonApache License 2.0Apache-2.0

voicechat2

A fast, fully local AI Voicechat using WebSockets

voicechat2.webm

Unmute to hear the audio

On an 7900-class AMD RDNA3 card, voice-to-voice latency is in the 1 second range:

  • Whisper large-v2 (Q5)
  • Llama 3 8B (Q4_K_M)
  • tts_models/en/vctk/vits (Coqui TTS default VITS models)

On a 4090, using Faster Whisper with faster-distil-whisper-large-v2 we can cut the latency down to as low as 300ms:

voicechat2-fw.webm

Install

These installation instructions are for Ubuntu LTS and assume you've setup your ROCm or CUDA already.

I recommend you use conda or (my preferred), mamba for environment management. It will make your life easier.

System Prereqs

sudo apt update

# Not strictly required but the helpers we use
sudo apt install byobu curl wget

# Audio processing
sudo apt install espeak-ng ffmpeg libopus0 libopus-dev 

Checkout code

# Create env
mamba create -y -n voicechat2 python=3.11

# Setup
mamba activate voicechat2
git clone https://github.com/lhl/voicechat2
cd voicechat2
pip install -r requirements.txt

whisper.cpp

# Build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# AMD version
# -DGGML_HIP_UMA=ON to work with APUs (but hurts dGPU perf)
GGML_HIPBLAS=1 make -j 
# Nvidia version
GGML_CUDA=1 make -j 

# Get model - large-v2 is 3094 MB
bash ./models/download-ggml-model.sh large-v2
# Quantized version - large-v2-q5_0 is  1080MB
# bash ./models/download-ggml-model.sh large-v2-q5_0

# If you're going to go to the next instruction
cd ..

llama.cpp

# Build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# AMD version
make GGML_HIPBLAS=1 -j 
# Nvidia version
make GGML_CUDA=1 -j 

# Grab your preferred GGUF model
wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf

# If you're going to go to the next instruction
cd ..

TTS

mamba activate voicechat2
pip install TTS

StyleTTS2

git clone https://github.com/yl4579/StyleTTS2.git
cd StyleTTS2
pip install -r requirements.txt
pip install phonemizer

# Download the LJSpeech Model
# https://huggingface.co/yl4579/StyleTTS2-LJSpeech/tree/main
# https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main
pip install huggingface_hub
huggingface-cli download --local-dir . yl4579/StyleTTS2-LJSpeech

Some extra convenience scripts for launching:

run-voicechat2.sh - on your GPU machine, tries to launch all servers in separate byobu sessions
remote-tunnel.sh - connect your GPU machine to a jump machine
local-tunnel.sh - connect to the GPU machine via a jump machine

Other AI Voicechat Projects

webrtc-ai-voice-chat

The demo shows a fair amount of latency (~10s) but this project isn't the closest to what we're doing (it uses WebRTC not websockets) from voicechat2 (HF Transformers, Ollama)

june

A console-based local client (HF Transformers, Ollama, Coqui TTS, PortAudio)

GlaDOS

This is a very responsive console-based local-client app that also has VAD and interruption support, plus a really clever hook! (whisper.cpp, llama.cpp, piper, espeak)

local-talking-llm

Another console-based local client, more of a proof of concept but with w/ blog writeup.

BUD-E - natural_voice_assistant

Another console-based local client (FastConformer, HF Transformers, StyleTTS2, espeak)

LocalAIVoiceChat

KoljaB has a number of interesting projects around console-based local clients like RealtimeSTT, RealtimeTTS, Linguflex, etc. (faster_whisper, llama.cpp, Coqui XTTS)

rtvi-web-demo

This is not a local voicechat client, but it does have a neat WebRTC front-end, so might be worth poking around into (Vite/React, Tailwind, Radix)