Questions: using v3 turbo + quantization + faster-whisper
Opened this issue · 0 comments
thiswillbeyourgithub commented
Hi @abb128 ,
I'm running a faster-whisper-server backend on a computer and am blown away by the rapid advancement in the field.
Notably, on my cheap low end hardware I'm still able to transcribe text way faster than I can speak it using the large v3 turbo model and int8 quantization. Using faster-whisper, using v3 turbo models and using quantization all resulted in a subjective leap in speed.
Naturally, some questions come to mind regarding the voice input of FUTO on android:
- Does FUTO plan on redoing the training with the latest v3 turbo models? They are more compact so we can imagine having near real time transcription on non large models on android right? If not planned, why not / what is missing? Is there anything the community can do to help?
- Actually looking at the notebook seems to indicate that this work was based on the original whisper implementation by openai. Is there a reason not to use faster-whisper from SYSTRAN? It's a way faster implementation. I made a FUTO shoutout there btw
- In the same vein, I don't think I see any quantization applied in the notebook, would that be a huge speed enhancement too? By making the model smaller and faster we can also considerably reduce the model loading time.
Thanks a lot for everything you've been doing.