Voice Activity Detection

This is a project in AI2615 in SJTU implemented by Qingquan Bao. The code is still be continued.

Requirements

All available in requirements.txt To install them, you can run pip install -r requirements.txt.

Temporal feature + naive classifier

In utils/time_feature_extraction.py, we implement two kinds of temporal feature ZCR (Zero Crossing Rate) and energy. In LRtest.py and model/state_machine.py, we implement Logisitic Resgression and State Machine classifier to detect voice activity in develop dataset.

To predict labels in new data, run python vad4test.py --model=LR --featType=Time --testdirPath=<your test file directory path> --outPath=<the output .txt path u wish>

Spectral feature + GMM

Spectral data is extracted in utils/spectralFeature.py where we implement FBank and MFCC. In gmm.py, we implement MFCC+GMM

To predict labels in new data with MFCC+GMM, run python gmm.py

To predict labels in new data, run python vad4test.py --model=GMM --featType=MFCC --testdirPath=<your test file directory path> --outPath=<the output .txt path u wish>

Spectral feature + LSTM

The model is implementde in model/lstm.py and now the architecture only support MEL40 feature.

To predict labels in new data, run python vad4test.py --model=LSTM --featType=MEL --testdirPath=<your test file directory path> --outPath=<the output .txt path u wish>

Result