Awesome Speaker Diarization
This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.
To add items to this page, simply send a pull request. (contributing guide )
Joint diarization and ASR
Link
Language
Description
SIDEKIT for diarization (s4d)
Python
An open source package extension of SIDEKIT for Speaker diarization.
pyAudioAnalysis
Python
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
AaltoASR
Python & Perl
Speaker diarization scripts, based on AaltoASR.
LIUM SpkDiarization
Java
LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013).
kaldi-asr
Bash
Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation.
Alize LIA_SpkSeg
C++
ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization.
pyannote-audio
Python
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding.
pyBK
Python
Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data.
Speaker-Diarization
Python
Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers.
EEND
Python & Bash & Perl
End-to-End Neural Diarization.
VBDiarization
Python
Speaker diarization based on Kaldi x-vectors using pretrained model trained in Kaldi (kaldi-asr/kaldi ) and converted to ONNX format (onnx/onnx ) running in ONNXRuntime (Microsoft/onnxruntime ).
RE-VERB
Python & JavaScript
RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when.
Link
Language
Description
uis-rnn
Python & PyTorch
Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization. This clustering algorithm is supervised .
uis-rnn-sml
Python & PyTorch
A variant of UIS-RNN, for the paper Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data.
DNC
Python & ESPnet
Transformer-based Discriminative Neural Clustering (DNC) for Speaker Diarisation. Like UIS-RNN, it is supervised .
SpectralCluster
Python
Spectral clustering with affinity matrix refinement operations.
sklearn.cluster
Python
scikit-learn clustering algorithms.
PLDA
Python
Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA
C++
Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).
Link
Method
Language
Description
resemble-ai/Resemblyzer
d-vector
Python & PyTorch
PyTorch implementation of generalized end-to-end loss for speaker verification, which can be used for voice cloning and diarization.
Speaker_Verification
d-vector
Python & TensorFlow
Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification
d-vector
Python & PyTorch
PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
Real-Time Voice Cloning
d-vector
Python & PyTorch
Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time.
deep-speaker
d-vector
Python & Keras
Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System.
x-vector-kaldi-tf
x-vector
Python & TensorFlow & Perl
Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector
i-vector
C++ & Perl
Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector
i-vector
Perl
Voxceleb1 i-vector based speaker recognition system.
Link
Language
Description
change_detection
Python & Keras
Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks.
Link
Language
Description
VB Diarization
Python
VB Diarization with Eigenvoice and HMM Priors.
Speaker embedding training sets
Name
Utterances
Speakers
Language
Pricing
Additional information
TIMIT
6K+
630
en
$250.00
Published in 1993, the TIMIT corpus of read speech is one of the earliest speaker recognition datasets.
VCTK
43K+
109
en
Free
Most were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent.
LibriSpeech
292K
2K+
en
Free
Large-scale (1000 hours) corpus of read English speech.
LibriVox
180K
9K+
Multiple
Free
Free public domain audiobooks. LibriSpeech is a processed subset of LibriVox. Each original unsegmented utterance could be very long.
VoxCeleb 1&2
1M+
7K
Multiple
Free
VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube.
The Spoken Wikipedia Corpora
5K
879
en, de, nl
Free
Volunteer readers reading Wikipedia articles.
CN-Celeb
130K+
1K
zh
Free
A Free Chinese Speaker Recognition Corpus Released by CSLT@Tsinghua University.
DeepMine
540K
1850
fa, en
Unknown
A speech database in Persian and English designed to build and evaluate speaker verification, as well as Persian ASR systems.