This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.
To add items to this page, simply send a pull request.
- Meeting Transcription Using Virtual Microphone Arrays
- Speaker diarisation using 2D self-attentive combination of embeddings
- Fully Supervised Speaker Diarization
- Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
- JHU Diarization System Description
- ODESSA at Albayzin Speaker Diarization Challenge 2018
- Neural speech turn segmentation and affinity propagation for speaker diarization
- Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks
- Speaker Diarization with LSTM
- Speaker diarization using deep neural network embeddings
- Speaker diarization using convolutional neural network for statistics accumulation refinement
- pyannote. metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
- Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks
- Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
- A study of the cosine distance-based mean shift for telephone speech diarization
- Speaker diarization with PLDA i-vector scoring and unsupervised calibration
- Artificial neural network features for speaker diarization
- PLDA-based Clustering for Speaker Diarization of Broadcast Streams
- Speaker diarization of meetings based on speaker role n-gram models
- An overview of automatic speaker diarization systems
- A spectral clustering approach to speaker diarization
Link | Language | Description |
---|---|---|
SIDEKIT for diarization (s4d) | Python | An open source package extension of SIDEKIT for Speaker diarization. |
pyAudioAnalysis | Python | Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications. |
AaltoASR | Python & Perl | Speaker diarization scripts, based on AaltoASR. |
LIUM_SpkDiarization | Java | LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013). |
kaldi-asr | Bash | Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation. |
Alize LIA_SpkSeg | C++ | ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization. |
pyannote-audio | Python | Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding. |
pyBK | Python | Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data. |
Speaker-Diarization | Python | Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers. |
Link | Language | Description |
---|---|---|
pyannote-metrics | Python | A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems. |
SimpleDER | Python | A lightweight library to compute Diarization Error Rate (DER). |
modified NIST md-eval.pl | Perl | From Mary Tai Knox |
NIST md-eval-v21.pl | Perl | From jitendra |
NIST md-eval-22.pl | Perl | From nryant |
dscore | Python & Perl | Diarization scoring tools. |
Sequence Match Accuracy | Python | Match the accuracy of two sequences with Hungarian algorithm. |
Link | Language | Description |
---|---|---|
uis-rnn | Python & PyTorch | Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization. |
SpectralCluster | Python | Spectral clustering with affinity matrix refinement operations. |
sklearn.cluster | Python | scikit-learn clustering algorithms. |
PLDA | Python | Probabilistic Linear Discriminant Analysis & classification, written in Python. |
PLDA | C++ | Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis). |
Link | Method | Language | Description |
---|---|---|---|
Speaker_Verification | d-vector | Python & TensorFlow | Tensorflow implementation of generalized end-to-end loss for speaker verification. |
PyTorch_Speaker_Verification | d-vector | Python & PyTorch | PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration. |
x-vector-kaldi-tf | x-vector | Python & TensorFlow & Perl | Tensorflow implementation of x-vector topology on top of Kaldi recipe. |
kaldi-ivector | i-vector | C++ & Perl | Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure. |
voxceleb-ivector | i-vector | Perl | Voxceleb1 i-vector based speaker recognition system. |
Link | Language | Description |
---|---|---|
VB Diarization | Python | VB Diarization with Eigenvoice and HMM Priors. |
Audio | Diarization ground truth | Language | Pricing | Additional information |
---|---|---|---|---|
2000 NIST Speaker Recognition Evaluation | Disk-6 (Switchboard), Disk-8 (CALLHOME) | Multiple | $2400.00 | Evaluation Plan |
2003 NIST Rich Transcription Evaluation Data | Together with audios | en, ar, zh | $2000.00 | telephone speech, broadcast news |
CALLHOME American English Speech | CALLHOME American English Transcripts | en | $1500.00 + $1000.00 | CH109 whitelist |
The ICSI Meeting Corpus | Together with audios | en | Free | License |
The AMI Meeting Corpus | Together with audios (need to be processed) | Multiple | Free | License |
Fisher English Training Speech Part 1 Speech | Fisher English Training Speech Part 1 Transcripts | en | $7000.00 + $1000.00 | |
Fisher English Training Part 2, Speech | Fisher English Training Part 2, Transcripts | en | $7000.00 + $1000.00 |
- Literature Review For Speaker Change Detection by Halil Erdoğan
- Speaker Diarization: Separation of Multiple Speakers in an Audio File by Jaspreet Singh
- Google's Diarization System: Speaker Diarization with LSTM by Google
- Fully Supervised Speaker Diarization: Say Goodbye to clustering by Google
- Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings by Microsoft Research
- Robust Speaker Diarization for Meetings: the ICSI system by Microsoft Research
Company | Product |
---|---|
Google Cloud Speech-to-Text API | |
Amazon | Amazon Transcribe |
IBM | Watson Speech To Text API |
DeepAffects | Speaker Diarization API |