lhl/voicechat2

Switch SRT to faster-whisper with HF transformers fallback

Closed this issue · 1 comments

lhl commented

In benchmarking whisper.cpp is not significantly faster vs HF (for non-CUDA compatibility), even using the Q5 model...

And faster-whisper (usings CTranslate2, CUDA required basically) is up 2X+ faster even for short single-user processing (and distil-whisper models are 2X faster still).

lhl commented

Despite fast-whisper 4-5X faster for longer tasks, HF transformers seems to be slightly faster for short (1-sentence or so) conversational exchanges. We have an srt-server.py now that let's you swap in different things.