This VAD library can process audio in real-time utilizing Gaussian Mixture Model (GMM) which helps identify presence of human speech in an audio sample that contains a mixture of speech and noise. VAD work offline and all processing done on device.
Library based on WebRTC VAD from Google which is reportedly one of the best available: it's fast, modern and free. This algorithm has found wide adoption and has recently become one of the gold-standards for delay-sensitive scenarios like web-based interaction.
If you are looking for a higher accuracy and faster processing time I recommend to use Deep Neural Networks(DNN). Please see for reference the following paper with DNN vs GMM comparison.
VAD library only accepts 16-bit mono PCM audio stream and can work with next Sample Rates, Frame Sizes and Classifiers.
|
|
Silence duration (ms) - this parameter used in Continuous Speech detector, the value of this parameter will define the necessary and sufficient duration of negative results to recognize it as silence.
Voice duration (ms) - this parameter used in Continuous Speech detector, the value of this parameter will define the necessary and sufficient duration of positive results to recognize result as speech.
Recommended parameters:
- Sample Rate - 16KHz,
- Frame Size - 160,
- Mode - VERY_AGGRESSIVE,
- Silence Duration - 500ms,
- Voice Duration - 500ms;
VAD supports 2 different ways of detecting speech:
- Continuous Speech listener was designed to detect long utterances without returning false positive results when user makes pauses between sentences.
Vad vad = new Vad(VadConfig.newBuilder()
.setSampleRate(VadConfig.SampleRate.SAMPLE_RATE_16K)
.setFrameSize(VadConfig.FrameSize.FRAME_SIZE_160)
.setMode(VadConfig.Mode.VERY_AGGRESSIVE)
.setSilenceDurationMillis(500)
.setVoiceDurationMillis(500)
.build());
vad.start();
vad.addContinuousSpeechListener(short[] audioFrame, new VadListener() {
@Override
public void onSpeechDetected() {
//speech detected!
}
@Override
public void onNoiseDetected() {
//noise detected!
}
});
vad.stop();
- Speech detector was designed to detect speech/noise in small audio frames and return result for every frame. This method will not work for long utterances.
Vad vad = new Vad(VadConfig.newBuilder()
.setSampleRate(VadConfig.SampleRate.SAMPLE_RATE_16K)
.setFrameSize(VadConfig.FrameSize.FRAME_SIZE_160)
.setMode(VadConfig.Mode.VERY_AGGRESSIVE)
.build());
vad.start();
boolean isSpeech = vad.isSpeech(short[] audioFrame);
vad.stop();
Android VAD supports Android 4.1 (Jelly Bean) and later.
To open the project in Android Studio:
- Go to File menu or the Welcome Screen
- Click on Open...
- Navigate to VAD's root directory.
- Select
setting.gradle
Gradle is the only supported build configuration, so just add the dependency to your project build.gradle
file:
- Add it in your root build.gradle at the end of repositories:
allprojects {
repositories {
maven { url 'https://jitpack.io' }
}
}
- Add the dependency
dependencies {
implementation 'com.github.gkonovalov:android-vad:1.0.1'
}
You also can download precompiled AAR library and APK files from GitHub's releases page.
Georgiy Konovalov 2021 (c) MIT License