Considering using Whisper as a multilingual solution
Closed this issue · 1 comments
zareami10 commented
Although this is likely to require a strong machine for good results (likely a GPU) but there have been attempts in making it more suitable for RL settings.
abb128 commented
Thank you for the suggestion. Whisper is indeed impressive, but it requires prohibitively good hardware to run at realtime speeds which I don't have. Moreover, Whisper is not a streaming model, so to make it stream requires iteratively feeding audio at whatever faster than realtime speed your computer can handle, which makes it even more difficult to run and results in relatively high latency. Due to this I think it's not suitable for live captions use case.