Error with Silero VAD
YasinSharifbeigy opened this issue · 2 comments
I faced an error in audio_processing_recording.py line 78, passing audio into vad_model.run_vad()
:
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/vad/model/vad_annotator.py", line 26, in forward
if _2:
_3 = torch.format(_0, (torch.size(x0))[-1])
ops.prim.RaiseException(_3, "builtins.ValueError")
~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
else:
pass
It seems input audio should be in the size of (channels_num, 512) instead of (1536,). It means the input tensor must be 2 dimensional and also for SAMPLE_RATE=16000
its length must be 512.
Alternatively, we can use _vad_model.audio_foeward()
instead of directly passing it through _vad_model.
This is strange, because i haven't had any issues with 1536, and many VAD examples still mention this number.
see also snakers4/silero-vad#322
which also mentions higher quality if the window is bigger.
But i will investigate further
In the meantime, you can change it via the vad_frames_per_buffer
setting. Either by directly editing the settings.yaml or by going to Advanced -> Settings
Sorry. didn't wanted this to be closed automatically.
But this will be fixed in the next update. :)