WeSpeaker

WeSpeaker mainly focuses on speaker embedding learning, with application to the speaker verification task. We support online feature extraction or loading pre-extracted features in kaldi-format.

Installation

Clone this repo

git clone https://github.com/wenet-e2e/wespeaker.git

Create conda env: pytorch version >= 1.10.0 is required !!!

conda create -n wespeaker python=3.9
conda activate wespeaker
conda install pytorch=1.12.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

If you just want to use the pretrained model, try the python binding!

pip3 install wespeakerruntime

🔥 News

2023.07.18: Support the kaldi-compatible PLDA and unsupervised adaptation, see #186.
2023.07.14: Support the NIST SRE16 recipe, see #177.
2023.07.10: Support the Self-Supervised Learning recipe on Voxceleb, including DINO, MoCo and SimCLR, see #180.
2023.06.30: Support the SphereFace2 loss function, with better performance and noisy robust in comparison with the ArcMargin Softmax, see #173.
2023.04.27: Support the CAM++ model, with better performance and single-thread inference rtf in comparison with the ResNet34 model, see #153.

Recipes

VoxCeleb: Speaker Verification recipe on the VoxCeleb dataset
- 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving 2.627% (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
- 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving 0.447%/0.043 EER/mindcf on vox1-O-clean test set
- 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
  - EER/minDCF on vox1-O-clean test set are 0.723%/0.069 (ResNet34) and 0.728%/0.099 (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
CNCeleb: Speaker Verification recipe on the CnCeleb dataset
- 🔥 UPDATE 2022.10.31: 221-layer ResNet achieves 5.655%/0.330 EER/minDCF
- 🔥 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 report slides
  - EER/minDCF reduction from 8.426%/0.487 to 6.492%/0.354 after large margin fine-tuning and AS-Norm
NIST SRE16: Speaker Verification recipe for the 2016 NIST Speaker Recognition Evaluation Plan. Similar recipe can be found in Kaldi.
- 🔥 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.
VoxConverse: Diarization recipe on the VoxConverse dataset

Support List:

Model (SOTA Models)
- Standard X-vector
- ResNet
- ECAPA_TDNN
- RepVGG
- CAM++
Pooling Functions
- TAP(mean) / TSDP(std) / TSTP(mean+std)
  - Comparison of mean/std pooling can be found in shuai_iscslp, anna_arxiv
- Attentive Statistics Pooling (ASTP)
  - Mainly for ECAPA_TDNN
- Multi-Query and Multi-Head Attentive Statistics Pooling (MQMHASTP)
  - Details can be found in MQMHASTP
Criteria
Scoring
- Cosine
- PLDA
- Score Normalization (AS-Norm)
Metric
- EER
- minDCF
Online Augmentation
- Noise && RIR
- Speed Perturb
- SpecAug
Training Strategy
- Well-designed Learning Rate and Margin Schedulers
- Large Margin Fine-tuning
- Automatic Mixed Precision (AMP) Training
Runtime
- Python Binding
- Triton Inference Server on verification && diarization in GPU deployment
- C++ Onnxruntime
Self-Supervised Learning (SSL)
- DINO
- MoCo
- SimCLR
Literature
- Awesome Speaker Papers

Discussion

For Chinese users, you can scan the QR code on the left to follow our offical account of WeNet Community. We also created a WeChat group for better discussion and quicker response. Please scan the QR code on the right to join the chat group.

Citations

If you find wespeaker useful, please cite it as

@article{wang2022wespeaker,
  title={Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit},
  author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
  journal={arXiv preprint arXiv:2210.17016},
  year={2022}
}

Looking for contributors

If you are interested to contribute, feel free to contact @wsstriving or @robin1001

wsstriving/wespeaker_plda