Switch SRT to faster-whisper with HF transformers fallback

Question

Switch SRT to faster-whisper with HF transformers fallback

lhl opened this issue 8 months ago · 1 comments

In benchmarking whisper.cpp is not significantly faster vs HF (for non-CUDA compatibility), even using the Q5 model...

And faster-whisper (usings CTranslate2, CUDA required basically) is up 2X+ faster even for short single-user processing (and distil-whisper models are 2X faster still).

Answer 1 · 2024-08-06T09:57:46.000Z

Despite fast-whisper 4-5X faster for longer tasks, HF transformers seems to be slightly faster for short (1-sentence or so) conversational exchanges. We have an srt-server.py now that let's you swap in different things.