Pytorch implementation of "spectro-temporal attention-based voice activity detection": https://ieeexplore.ieee.org/document/8933025
My implementation of STAM provides slightly better performance compared to the original tensorflow one:
Tensorflow: Global AUC: 99.86, F1-score: 98.15, DCF: 1.32, accuracy: 97.90, precision: 99.10
Pytorch: Global AUC: 99.87, F1-score: 98.31, DCF: 1.18, accuracy: 98.07, precision: 99.06
TIMIT training data + NOISEX (SNR: -10, -5, 0, 5, 10dB)
TIMIT testing data + AURORA (SNR: -5, 0, 5, 10dB)