Here, we offer an unofficial, automatically generated speaker label of the WenetSpeech dataset for speech research.
The label is formulated with the kaldi utt2spk style as follows:
X0000018313_42315061_S00016 X0000018313_42315061_spk0
X0000018313_42315061_S00009 X0000018313_42315061_spk0
X0000018313_42315061_S00011 X0000018313_42315061_spk0
X0000018313_42315061_S00033 X0000018313_42315061_spk0
X0000018313_42315061_S00010 X0000018313_42315061_spk0
The first item represents the original utterance identity of WeNetSpeech, while the second term indicates the speaker label.
The speaker label is generated through the following steps:
- Enhancing the original speech utterance using a state-of-the-art band-split RNN (BSRNN) model.
- Calculating segment-level speaker embeddings using a pre-trained speaker verification model from Wespeaker.
- For each long utterance, applying spectral clustering to the speech segments to generate the speaker labels.
Dataset | Utterance num | Segment num | Speaker number |
---|---|---|---|
Wenetspeech (ori) | 0.06M | 14.6M | - |
Wenetspeech (cluster) | 0.06M | 11.9M | 0.23M |
Note that, currently we donot apply speaker clustering across difference long utterance of wenetspeech.
The utt2spk file can be downloaded via Link.
- Automatic speaker label of Gigaspeech
Authorship: Jianwei Yu, Hangting Chen, Shuai Wang
Copyright 2023 Tencent AI Lab, Shenzhen Research Institute of Big data
License: Creative Commons Attribution 4.0 International (CC BY-NC-SA 4.0).