raphpapercup/VAD-python

Voice Activity Detector in Python

Python

Voice Activity Detector

Python code to apply voice activity detector to wave file. Voice activity detector based on ration between energy in speech band and total energy.

Requirements

numpy
scipy
matplotlib

Basic Idea

Input audio data treated as following:

Convert stereo to mono
Move a window of 20ms along the audio data
Calculate ration between energy of speech band and total energy for window
If ratio is more than threshold (0.6 by default) label windows as speech
Apply median filter with length of 0.5s to smooth detected speech regions
Represent speech regions as intervals of time

How To

Create object:

import vad module
create instance of class VoiceActivityDetector with full path to wave file
run method to detect speech regions
optionally, plot original wave data and detected speech region

Example python script which saves speech intervals in json file:

./detectVoiceInWave.py ./wav-sample.wav ./results.json

Example pyhton code to plot detected speech regions:

from vad import VoiceActivityDetector

filename = '/Users/user/wav-sample.wav'
v = VoiceActivityDetector(filename)
v.plot_detected_speech_regions()

Alexander USOLTSEV 2015 (c) MIT License