ladenedge/WebRtcVadSharp

HasSpeech is always true

Closed this issue · 4 comments

Hello,

I've tried everything that I can think of. I have a very simple implementation here. Really hoping to get some advice. This is going to be a life saver library for my project.

I'm passing in a 16khz, mono channel wav file, codec used was pcm_s16le.

I'm on version 1.3.1, testing on Windows 10 Build 19042

using var vad = new WebRtcVad()
{
    OperatingMode = OperatingMode.Aggressive,
    FrameLength = FrameLength.Is20ms,
    SampleRate = SampleRate.Is16kHz,
};

// I tried with * 1 instead of * 2 here as well, since the wav I'm using is mono channel
var frameSize = (int)vad.SampleRate / 1000 * 2 * (int)vad.FrameLength;

var audioBytes = await File.ReadAllBytesAsync("birds.wav");

for (var i = 0; i < audioBytes.Length - frameSize; i += frameSize)
{
    var hasSpeech = vad.HasSpeech(audioBytes.Skip(i).Take(frameSize).ToArray());

    if (hasSpeech)
    {
        // inspecting with breakpoint here, always hits on first pass, when there is no speech.
        break;
    }
}

WebRTC doesn't work with WAV files directly -- it only works with raw audio. So while your codec looks good, the WAV file is going to include some metadata about that audio that WebRTC doesn't understand. You'll need to send it the audio within the WAV container by either:

  • manually converting your WAV file to PCM/RAW with, eg, FFMpeg or Audacity, or
  • (perhaps better) filter your audio through a library like NAudio which can read that WAV metadata and provide you the proper raw audio stream.

Here's some untested sample code that should get you started with the latter approach:

using var vad = new WebRtcVad();
using var audio = new WaveFileReader(wavAudio);
var fmt = audio.WaveFormat;
var frameBytes = FrameLength.Is20ms * fmt.SampleRate / 1000 * fmt.Channels * fmt.BitsPerSample / 8;
var audioData = new byte[frameBytes];
while (true)
{
   if (await audio.ReadAsync(audioData.AsMemory()) != audioData.Length)
      break;
   var hasSpeech = vad.HasSpeech(audioData);
}

Good luck!

I'll try out the raw file today. Sounds like this will definitely solve it. Thank you!

how to pass an array of float32 format to it?

You'll need to convert your 32-bit IEEE floats to Linear 16-bit PCM. NAudio can do this if you use .NET, otherwise it looks like you might be able to adapt someone's manual conversion code.