Implementing VGGVox for VoxCeleb1 dataset in PyTorch.
pip install -r requirements.txt
python3 train.py --dir ./Data/
- 81.79% Top-1 & 93.17 Top-5 Test-set accuracy, pretty satisfactory. Find details in results.txt.
- Training on the V100 takes 4 mins per epoch.
- Run
python3 vggm.py
for model architecture. - Best model weights uploaded VGGM300_BEST_140_81.99.pth
- All the data preprocessed exactly as author's matlab code. Checked and verified online on matlab
- Random 3s cropped segments for training.
- Copy all hyperparameter... LR, optimizer params, batch size from the author's net.
- Stabilize PyTorch's BatchNorm and test variants. Improved results by a small percentage.
- Try onesided spectrogram input as mentioned on the author's github.
-
Port the authors network from matlab and train. The matlab model has 1300 outputs dimension, will test it later. -
Copy weights from the matlab network and test.
- VGGVox
- linhdvu14's vggvox-speaker-identification
- jameslyons's python_speech_features
@InProceedings{Nagrani17,
author = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
title = "VoxCeleb: a large-scale speaker identification dataset",
booktitle = "INTERSPEECH",
year = "2017",
}
@InProceedings{Nagrani17,
author = "Chung, J.~S. and Nagrani, A. and Zisserman, A.",
title = "VoxCeleb2: Deep Speaker Recognition",
booktitle = "INTERSPEECH",
year = "2018",
}