transcribe fails with 'bad alloc' or 'Failed to process waveform' error

Question

transcribe fails with 'bad alloc' or 'Failed to process waveform' error

4or5trees opened this issue 3 years ago · 4 comments

Amazing tool! Works very well.

I do have an issue with the --transcribe option, though:

OS: Windows 10
Videogrep version: 2.0.1
Vosk version: 0.3.41

> videogrep -i '.\My-MKV-file.mkv' --transcribe
Transcribing .\My-MKV-file.mkv
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

(I've run it with both '.\My-MKV-file.mkv' (PowerShell autocompletes like this) as argument and 'My-MKV-file.mkv' with the same result)

It looks like an issue coming from one of the DLLs accompanying videogrep. It happens maybe 10-20 seconds after running the command.

I am able to create a txt or srt with vosk using the vosk-transcriber directly.

Alternatively, when specifying a vosk model of my own then the command fails after 10-15 minutes with a somewhat more helpful stacktrace:

videogrep.exe -i '.\My-MKV-file.mkv' --transcribe --model 'C:\Users\MyUser\.cache\vosk\vosk-model-en-us-0.22\'
Transcribing .\My-MKV-file.mkv
Traceback (most recent call last):
  File "c:\users\MyUser\appdata\local\programs\python\python37-32\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\MyUser\appdata\local\programs\python\python37-32\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\MyUser\AppData\Local\Programs\Python\Python37-32\Scripts\videogrep.exe\__main__.py", line 7, in <module>
  File "c:\users\MyUser\appdata\local\programs\python\python37-32\lib\site-packages\videogrep\cli.py", line 129, in main
    transcribe.transcribe(f, args.model)
  File "c:\users\MyUser\appdata\local\programs\python\python37-32\lib\site-packages\videogrep\transcribe.py", line 66, in transcribe
    rec.AcceptWaveform(data)
  File "c:\users\MyUser\appdata\local\programs\python\python37-32\lib\site-packages\vosk\__init__.py", line 170, in AcceptWaveform
    raise Exception("Failed to process waveform")
Exception: Failed to process waveform

(I've run it with both C:\Users\MyUser\.cache\vosk\vosk-model-en-us-0.22\ and without trailing \ at the end C:\Users\MyUser\.cache\vosk\vosk-model-en-us-0.22 and the error is the same in both cases)

The AcceptWaveform function looks like this:

def AcceptWaveform(self, data):
        res = _c.vosk_recognizer_accept_waveform(self._handle, data, len(data))
        if res < 0:
            raise Exception("Failed to process waveform")
        return res

Overall this is not a very blocking issue since videogrep supports existing srt files and with a bit of editing I can still very easily create a supercut.

But do you have any idea what could be causing this?

Answer 1 · 2022-05-27T17:00:21.000Z

Interesting! Will definitely look into this. I'm basing the transcription off of one of vosk's example scripts, but not the transcriber. I'll take a look at it and maybe I can figure out what's going wrong...

Answer 2 · 2022-05-27T17:07:29.000Z

Also do you happen to have a smallish file that fails to transcribe that I could use to test with?

Answer 3 · 2022-05-27T18:00:49.000Z

@4or5trees I've just made some small changes to the transcribing code based on vosk-transcriber. I haven't published to pypi yet, but if you'd like you can test it out by installing directly from the repo:

pip3 install git+https://github.com/antiboredom/videogrep

Answer 4 · 2022-05-27T22:45:20.000Z

@4or5trees I've just made some small changes to the transcribing code based on vosk-transcriber. I haven't published to pypi yet, but if you'd like you can test it out by installing directly from the repo:
pip3 install git+https://github.com/antiboredom/videogrep

Wow that fixes the issue for me! Transcribing works (also when specifying a custom model path).

Thank you so much for fixing it, and so quickly too!