This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.
To add items to this page, simply send a pull request. (contributing guide)
- Supervised online diarization with sample mean loss for multi-domain data
- Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection
- Speaker diarization using latent space clustering in generative adversarial network
- A study of semi-supervised speaker diarization system using gan mixture model
- Learning deep representations by multilayer bootstrap networks for speaker diarization
- Discriminative Neural Clustering for Speaker Diarisation
- End-to-End Neural Speaker Diarization with Permutation-Free Objectives
- End-to-End Neural Speaker Diarization with Self-attention
- Enhancements for Audio-only Diarization Systems
- LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
- Joint Speech Recognition and Speaker Diarization via Sequence Transduction
- Meeting Transcription Using Virtual Microphone Arrays
- Speaker diarisation using 2D self-attentive combination of embeddings
- Fully Supervised Speaker Diarization
- Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
- ODESSA at Albayzin Speaker Diarization Challenge 2018
- Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Challenge
- Neural speech turn segmentation and affinity propagation for speaker diarization
- Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks
- Speaker Diarization with LSTM
- Speaker diarization using deep neural network embeddings
- Speaker diarization using convolutional neural network for statistics accumulation refinement
- pyannote. metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
- Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
- Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
- A study of the cosine distance-based mean shift for telephone speech diarization
- Speaker diarization with PLDA i-vector scoring and unsupervised calibration
- Artificial neural network features for speaker diarization
- PLDA-based Clustering for Speaker Diarization of Broadcast Streams
- Speaker diarization of meetings based on speaker role n-gram models
- An overview of automatic speaker diarization systems
- A spectral clustering approach to speaker diarization
Link | Language | Description |
---|---|---|
SIDEKIT for diarization (s4d) | Python | An open source package extension of SIDEKIT for Speaker diarization. |
pyAudioAnalysis | Python | Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications. |
AaltoASR | Python & Perl | Speaker diarization scripts, based on AaltoASR. |
LIUM SpkDiarization | Java | LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013). |
kaldi-asr | Bash | Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation. |
Alize LIA_SpkSeg | C++ | ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization. |
pyannote-audio | Python | Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding. |
pyBK | Python | Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data. |
Speaker-Diarization | Python | Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers. |
EEND | Python & Bash & Perl | End-to-End Neural Diarization. |
VBDiarization | Python | Speaker diarization based on Kaldi x-vectors using pretrained model trained in Kaldi (kaldi-asr/kaldi) and converted to ONNX format (onnx/onnx) running in ONNXRuntime (Microsoft/onnxruntime). |
Link | Language | Description |
---|---|---|
pyannote-metrics | Python | A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems. |
SimpleDER | Python | A lightweight library to compute Diarization Error Rate (DER). |
NIST md-eval | Perl | (1) modified md-eval.pl from Mary Tai Knox; (2) md-eval-v21.pl from jitendra; (3) md-eval-22.pl from nryant |
dscore | Python & Perl | Diarization scoring tools. |
Sequence Match Accuracy | Python | Match the accuracy of two sequences with Hungarian algorithm. |
Link | Language | Description |
---|---|---|
uis-rnn | Python & PyTorch | Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization. This clustering algorithm is supervised. |
uis-rnn-sml | Python & PyTorch | A variant of UIS-RNN, for the paper Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data. |
DNC | Python & ESPnet | Transformer-based Discriminative Neural Clustering (DNC) for Speaker Diarisation. Like UIS-RNN, it is supervised. |
SpectralCluster | Python | Spectral clustering with affinity matrix refinement operations. |
sklearn.cluster | Python | scikit-learn clustering algorithms. |
PLDA | Python | Probabilistic Linear Discriminant Analysis & classification, written in Python. |
PLDA | C++ | Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis). |
Link | Method | Language | Description |
---|---|---|---|
Speaker_Verification | d-vector | Python & TensorFlow | Tensorflow implementation of generalized end-to-end loss for speaker verification. |
PyTorch_Speaker_Verification | d-vector | Python & PyTorch | PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration. |
Real-Time Voice Cloning | d-vector | Python & PyTorch | Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time. |
deep-speaker | d-vector | Python & Keras | Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System. |
x-vector-kaldi-tf | x-vector | Python & TensorFlow & Perl | Tensorflow implementation of x-vector topology on top of Kaldi recipe. |
kaldi-ivector | i-vector | C++ & Perl | Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure. |
voxceleb-ivector | i-vector | Perl | Voxceleb1 i-vector based speaker recognition system. |
Link | Language | Description |
---|---|---|
change_detection | Python & Keras | Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks. |
Link | Language | Description |
---|---|---|
VB Diarization | Python | VB Diarization with Eigenvoice and HMM Priors. |
Audio | Diarization ground truth | Language | Pricing | Additional information |
---|---|---|---|---|
2000 NIST Speaker Recognition Evaluation | Disk-6 (Switchboard), Disk-8 (CALLHOME) | Multiple | $2400.00 | Evaluation Plan |
2003 NIST Rich Transcription Evaluation Data | Together with audios | en, ar, zh | $2000.00 | telephone speech, broadcast news |
CALLHOME American English Speech | CALLHOME American English Transcripts | en | $1500.00 + $1000.00 | CH109 whitelist |
The ICSI Meeting Corpus | Together with audios | en | Free | License |
The AMI Meeting Corpus | Together with audios (need to be processed) | Multiple | Free | License |
Fisher English Training Speech Part 1 Speech | Fisher English Training Speech Part 1 Transcripts | en | $7000.00 + $1000.00 | |
Fisher English Training Part 2, Speech | Fisher English Training Part 2, Transcripts | en | $7000.00 + $1000.00 |
- Literature Review For Speaker Change Detection by Halil Erdoğan
- Speaker Diarization: Separation of Multiple Speakers in an Audio File by Jaspreet Singh
- Google's Diarization System: Speaker Diarization with LSTM by Google
- Fully Supervised Speaker Diarization: Say Goodbye to clustering by Google
- Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings by Microsoft Research
- Robust Speaker Diarization for Meetings: the ICSI system by Microsoft Research
Company | Product |
---|---|
Google Cloud Speech-to-Text API | |
Amazon | Amazon Transcribe |
IBM | Watson Speech To Text API |
DeepAffects | Speaker Diarization API |