Acquil/deep-read

Slow speech recognition

Closed this issue · 3 comments

image
for 7:03 video.
Referenced PRs

Takes around 30 sec to transcribe for 1 minute split in the audio file but a single 7 min chunk takes only 70 seconds, so larger chunks of audio are preferred.

Additional stats to report:
2 min 23 seconds elapsed for 28 min video/audio(split into 4 minute segments) with 8 subprocesses.

Additional Tasks:

  • Benchmark with different options against different audio files.

  • Check if words get cut out in the middle while dividing into chunks.

To test please do the following

  1. Switch to speech_text branch
  2. Go to deep-read/workbench/speech/
  3. Run setup.sh
    chmod +x setup.sh
    ./setup.sh
  4. Run speech.py and pass path to audio file as cmd-line argument.
    chmod +x speech.py
    ./speech.py input.wav

You can convert a video to a .wav file with ffmpeg ffmpeg -i input.mp4 -vn deep-read/workbench/speech/output.wav

We will evaluate Speech Recognition with Indian(EN) model.