Slow speech recognition

Question

Slow speech recognition

Closed this issue 4 years ago · 3 comments

for 7:03 video.
Referenced PRs

#4
#3

Answer 1 · 2020-09-29T15:18:01.000Z

Takes around 30 sec to transcribe for 1 minute split in the audio file but a single 7 min chunk takes only 70 seconds, so larger chunks of audio are preferred.

Additional stats to report:
2 min 23 seconds elapsed for 28 min video/audio(split into 4 minute segments) with 8 subprocesses.

Answer 2 · 2020-09-29T15:34:50.000Z

Additional Tasks:

Benchmark with different options against different audio files.
Check if words get cut out in the middle while dividing into chunks.
- Might require changes in audiosplitter.py

To test please do the following

Switch to speech_text branch
Go to deep-read/workbench/speech/
Run setup.sh
chmod +x setup.sh
./setup.sh
Run speech.py and pass path to audio file as cmd-line argument.
chmod +x speech.py
./speech.py input.wav

You can convert a video to a .wav file with ffmpeg ffmpeg -i input.mp4 -vn deep-read/workbench/speech/output.wav

Answer 3 · 2020-09-29T16:18:37.000Z

We will evaluate Speech Recognition with Indian(EN) model.