abb128/LiveCaptions

Considering using Whisper as a multilingual solution

Closed this issue · 1 comments

Although this is likely to require a strong machine for good results (likely a GPU) but there have been attempts in making it more suitable for RL settings.

Thank you for the suggestion. Whisper is indeed impressive, but it requires prohibitively good hardware to run at realtime speeds which I don't have. Moreover, Whisper is not a streaming model, so to make it stream requires iteratively feeding audio at whatever faster than realtime speed your computer can handle, which makes it even more difficult to run and results in relatively high latency. Due to this I think it's not suitable for live captions use case.