Execution for Diarization using whisperX is taking too long

Question

Execution for Diarization using whisperX is taking too long

Closed this issue a year ago · 1 comments

Script execution for a 46 minutes audio on A100 GPU is taking more than an hour. In the accompanying article, the execution time was about 15 minutes when using "base" whisper model for transcription and diarization. But when I tried to execute the same script, it is taking >1 hour. The code cell is still being executed. Not sure why it is taking too long. Any help in this regard shall be highly appreciated.

P.S: I used a pyannote.audio based diarization script and it took just 2 minutes on the same audio. So I was expecting about 5 to 10 minutes for whisperX.

Answer 1 · 2023-08-11T09:41:04.000Z

Hey @qaixerabbas, Whisper in itself is quite slow. But they recently added batched inference in WhisperX, which should speed it up significantly (it is supposed to design for real-time inference). Can you try it and share back some feedback?