
WebRTC-based Voice Activity Detection library

Primary LanguageC++MIT LicenseMIT


WebRTC-based Voice Activity Detection library

Voice Activity Detection based on the method used in the upcoming WebRTC HTML5 standard. Extracted from Chromium for stand-alone use as a library.

Sample MPEG audio decoder is a stripped-down libmpeg123 from MPEG123.

The Node.js bindings provide a simple way to do VAD on PCM audio input. Input data needs to be constant bitrate normalised float (-1..+1) PCM audio samples. Detection results are returned using an async callback and additionally via events.

Supported sample rates are:

  • 8000Hz*
  • 16000Hz*
  • 32000Hz
  • 48000Hz

*recommended sample rate for best performance/accuracy tradeoff



var VAD = require('vad').VAD,


Create a new VAD object using the given mode. The 'mode' parameter is optional.

.processAudio(samples, samplerate, callback)

Analyse the given samples (Buffer object containing normalised 32bit float values) and notify the detected voice event via callback and event.

.on(event, callback)

Subscribe to an event emitted by the VAD instance after detection. The event data provided to the callback is a number that represents the voice detection result. Supported event names are:

  • 'event': VAD processsing finished successfully or with an error
  • 'voice': Human speech was detected
  • 'silence': Silence/non-speech was detected
  • 'noise': [not implemented yet]
  • 'error': an error occured during detection

Event codes

Event codes are passed to the processAudio callback and to event handlers subscribed to the general 'event'-event.


Constant for voice detection errors. Passed to 'error' event handlers.


Constant for voice detection results with no detected voices. Passed to 'silence' event handlers.


Constant for voice detection results with detected voice. Passed to 'voice' event handlers.


Constant for voice detection results with detected noise. Not implemented yet

Available VAD Modes

These contants can be used as the mode parameter of the VAD constructor to configure the VAD algorithm.


Constant for normal voice detection mode. Suitable for high bitrate, low-noise data. May classify noise as voice, too. The default value if mode is omitted in the constructor.


Detection mode optimised for low-bitrate audio.


Detection mode best suited for somewhat noisy, lower quality audio.


Detection mode with lowest miss-rate. Works well for most inputs.


Utility function that coverts a Buffer object to a TypedArray of type Float32Array. Works with node <0.12 as well as recent versions. Introduced as a node version-agnostic shim.


The library is designed to work with input streams in mind, that is, sample buffers fed to processAudio should be rather short (36ms to 144ms - depending on your needs) and the sample rate no higher than 32kHz. Sample rates higher than than 16kHz provide no benefit to the VAD algorithm, as human voice patterns center around 4000 to 6000Hz. Minding the Nyquist-frequency yields sample rates between 8000 and 12000Hz for best results.


var VAD = require('vad').VAD

var pcmInputStream = getReadableAudioStreamSomehow()
var pcmOutputStream = getWritableStreamSomehow()

vad.on('voice', function() {
  console.info('Voice detected!')

// this example tries to remove non-speech from an audio file
pcmInputStream.on('data', function(chunk) {
  // assume audio data is 32bit float @ 16kHz
  vad.processAudio(chunk, 160000, function(error, event) {
    if (event === VAD.EVENT_VOICE) {
