shirayu/whispering

Setting for real-time streaming ASR?

iwangjian opened this issue · 2 comments

Hi, appreciate your excellent project! I tried running the server and the client successfully. I found that ASR responds slowly, although I set --frame to a smaller value (i.e., 100), --num_block to 80, and --vad to 0. Whether is it possible to apply your project for real-time streaming ASR? If possible, may I know how to set the parameters properly? Thank you.

Hi. For real-time processing, ASR must be performed in less than 1 second for a 1-second interval.
Real-time processing may be difficult because of the current whisper is slow in general.

Processing time is mainly determined by GPU performance and a model size.
Therefore, specifying a small model like --model tiny is one way.

Another way is to use VAD, which is lighter than whisper's processing.
If the VAD determines that a section is silent, it skips the whisper processing.

Got it, thank you very much!