mkiol/dsnote

Some processing steps maybe not pipelined

Opened this issue · 1 comments

yump commented

When transcribing an hour of opus audio with either WhisperCPP Tiny or FasterWhisper Tiny, my CPU utilization looks like this:

image

There is lots of idle CPU time there. According to the inserted statistics, FasterWhisper is going at something like 25x speed (61500 ms / 2414 ms). Is there some inherently serial part of the process that's slower than 25x?

ffmpeg -i file.opus -f null - reports that the audio can be decoded at 430x speed. So it doesn't seem like decoding should be a bottleneck

Hi, thanks for noticing this.

Periods of low CPU usage are most likely related to VAD processing (Voice activity detection). Currently, the STT decoder is fed with audio data only when a voice is detected. This performance degradation is due to the fact that my implementation of how to transfer data from the file reader to the VAD processor is very slow and totally ineffiecient. It needs to be rewritten.

Let's keep this issue open. I will try to solve this problem in future releases.