Sharrnah/whispering

Error with Silero VAD

YasinSharifbeigy opened this issue · 2 comments

I faced an error in audio_processing_recording.py line 78, passing audio into vad_model.run_vad():

Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/vad/model/vad_annotator.py", line 26, in forward
    if _2:
      _3 = torch.format(_0, (torch.size(x0))[-1])
      ops.prim.RaiseException(_3, "builtins.ValueError")
      ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    else:
      pass

It seems input audio should be in the size of (channels_num, 512) instead of (1536,). It means the input tensor must be 2 dimensional and also for SAMPLE_RATE=16000 its length must be 512.

Alternatively, we can use _vad_model.audio_foeward() instead of directly passing it through _vad_model.

This is strange, because i haven't had any issues with 1536, and many VAD examples still mention this number.

see also snakers4/silero-vad#322

which also mentions higher quality if the window is bigger.

But i will investigate further

In the meantime, you can change it via the vad_frames_per_buffer setting. Either by directly editing the settings.yaml or by going to Advanced -> Settings

Sorry. didn't wanted this to be closed automatically.

But this will be fixed in the next update. :)