Execution for Diarization using whisperX is taking too long
Closed this issue · 1 comments
Script execution for a 46 minutes audio on A100 GPU is taking more than an hour. In the accompanying article, the execution time was about 15 minutes when using "base" whisper model for transcription and diarization. But when I tried to execute the same script, it is taking >1 hour. The code cell is still being executed. Not sure why it is taking too long. Any help in this regard shall be highly appreciated.
P.S: I used a pyannote.audio based diarization script and it took just 2 minutes on the same audio. So I was expecting about 5 to 10 minutes for whisperX.
Hey @qaixerabbas, Whisper in itself is quite slow. But they recently added batched inference in WhisperX, which should speed it up significantly (it is supposed to design for real-time inference). Can you try it and share back some feedback?