Awesome Speaker Diarization

Overview
Publications
Software
Datasets
Leaderboards
Other learning materials
- Tech blog
- Video tutorials
Products

Overview

This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.

To add items to this page, simply send a pull request. (contributing guide)

Publications

2019

2018

Fully Supervised Speaker Diarization
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
ODESSA at Albayzin Speaker Diarization Challenge 2018
Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Challenge
Neural speech turn segmentation and affinity propagation for speaker diarization
Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

2017

2016

A Speaker Diarization System for Studying Peer-Led Team Learning Groups

2015

Diarization resegmentation in the factor analysis subspace

2014

2013

Unsupervised methods for speaker diarization: An integrated and iterative approach

2011

2006

Software

Framework

Link	Language	Description
SIDEKIT for diarization (s4d)	Python	An open source package extension of SIDEKIT for Speaker diarization.
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
AaltoASR	Python & Perl	Speaker diarization scripts, based on AaltoASR.
LIUM SpkDiarization	Java	LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013).
kaldi-asr	Bash	Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation.
Alize LIA_SpkSeg	C++	ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization.
pyannote-audio	Python	Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding.
pyBK	Python	Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data.
Speaker-Diarization	Python	Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers.
EEND	Python & Bash & Perl	End-to-End Neural Diarization.
VBDiarization	Python	Speaker diarization based on Kaldi x-vectors using pretrained model trained in Kaldi (kaldi-asr/kaldi) and converted to ONNX format (onnx/onnx) running in ONNXRuntime (Microsoft/onnxruntime).

Evaluation

Link	Language	Description
pyannote-metrics	Python	A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems.
SimpleDER	Python	A lightweight library to compute Diarization Error Rate (DER).
NIST md-eval	Perl	(1) modified md-eval.pl from Mary Tai Knox; (2) md-eval-v21.pl from jitendra; (3) md-eval-22.pl from nryant
dscore	Python & Perl	Diarization scoring tools.
Sequence Match Accuracy	Python	Match the accuracy of two sequences with Hungarian algorithm.

Clustering

Link	Language	Description
uis-rnn	Python & PyTorch	Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization. This clustering algorithm is supervised.
uis-rnn-sml	Python & PyTorch	A variant of UIS-RNN, for the paper Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data.
DNC	Python & ESPnet	Transformer-based Discriminative Neural Clustering (DNC) for Speaker Diarisation. Like UIS-RNN, it is supervised.
SpectralCluster	Python	Spectral clustering with affinity matrix refinement operations.
sklearn.cluster	Python	scikit-learn clustering algorithms.
PLDA	Python	Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA	C++	Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).

Speaker embedding

Link	Method	Language	Description
Speaker_Verification	d-vector	Python & TensorFlow	Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification	d-vector	Python & PyTorch	PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
Real-Time Voice Cloning	d-vector	Python & PyTorch	Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time.
deep-speaker	d-vector	Python & Keras	Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System.
x-vector-kaldi-tf	x-vector	Python & TensorFlow & Perl	Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector	i-vector	C++ & Perl	Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector	i-vector	Perl	Voxceleb1 i-vector based speaker recognition system.

Speaker change detection

Link	Language	Description
change_detection	Python & Keras	Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks.

Other software

Link	Language	Description
VB Diarization	Python	VB Diarization with Eigenvoice and HMM Priors.

Datasets

Audio	Diarization ground truth	Language	Pricing	Additional information
2000 NIST Speaker Recognition Evaluation	Disk-6 (Switchboard), Disk-8 (CALLHOME)	Multiple	$2400.00	Evaluation Plan
2003 NIST Rich Transcription Evaluation Data	Together with audios	en, ar, zh	$2000.00	telephone speech, broadcast news
CALLHOME American English Speech	CALLHOME American English Transcripts	en	$1500.00 + $1000.00	CH109 whitelist
The ICSI Meeting Corpus	Together with audios	en	Free	License
The AMI Meeting Corpus	Together with audios (need to be processed)	Multiple	Free	License
Fisher English Training Speech Part 1 Speech	Fisher English Training Speech Part 1 Transcripts	en	$7000.00 + $1000.00
Fisher English Training Part 2, Speech	Fisher English Training Part 2, Transcripts	en	$7000.00 + $1000.00

Leaderboards

Other learning materials

Tech blog

Video tutorials

Google's Diarization System: Speaker Diarization with LSTM by Google
Fully Supervised Speaker Diarization: Say Goodbye to clustering by Google
Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings by Microsoft Research
Robust Speaker Diarization for Meetings: the ICSI system by Microsoft Research

Products

Company	Product
Google	Google Cloud Speech-to-Text API
Amazon	Amazon Transcribe
IBM	Watson Speech To Text API
DeepAffects	Speaker Diarization API

Naminwang/awesome-diarization