Spectro-Temporal Attention Based Voice Activity Detection (pytorch)

Pytorch implementation of "spectro-temporal attention-based voice activity detection": https://ieeexplore.ieee.org/document/8933025

My implementation of STAM provides slightly better performance compared to the original tensorflow one:

Tensorflow: Global AUC: 99.86, F1-score: 98.15, DCF: 1.32, accuracy: 97.90, precision: 99.10

Pytorch: Global AUC: 99.87, F1-score: 98.31, DCF: 1.18, accuracy: 98.07, precision: 99.06

Training data

TIMIT training data + NOISEX (SNR: -10, -5, 0, 5, 10dB)

TIMIT testing data + AURORA (SNR: -5, 0, 5, 10dB)