/awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Apache License 2.0Apache-2.0

Awesome Speaker Diarization Awesome Contribution

Table of contents

Overview

This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.

To add items to this page, simply send a pull request. (contributing guide)

Publications

2019

2018

2017

2016

2015

2014

2013

2011

2010

2009

2008

2006

Software

Framework

Link Language Description
SIDEKIT for diarization (s4d) Python An open source package extension of SIDEKIT for Speaker diarization.
pyAudioAnalysis GitHub stars Python Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
AaltoASR GitHub stars Python & Perl Speaker diarization scripts, based on AaltoASR.
LIUM SpkDiarization Java LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013).
kaldi-asr Build Status Bash Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation.
Alize LIA_SpkSeg C++ ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization.
pyannote-audio GitHub stars Python Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding.
pyBK GitHub stars Python Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data.
Speaker-Diarization GitHub stars Python Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers.
EEND GitHub stars Python & Bash & Perl End-to-End Neural Diarization.
VBDiarization GitHub stars Python Speaker diarization based on Kaldi x-vectors using pretrained model trained in Kaldi (kaldi-asr/kaldi) and converted to ONNX format (onnx/onnx) running in ONNXRuntime (Microsoft/onnxruntime).

Evaluation

Link Language Description
pyannote-metrics GitHub stars Build Status Python A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems.
SimpleDER GitHub stars Build Status Python A lightweight library to compute Diarization Error Rate (DER).
NIST md-eval Perl (1) modified md-eval.pl from Mary Tai Knox; (2) md-eval-v21.pl from jitendra; (3) md-eval-22.pl from nryant
dscore GitHub stars Python & Perl Diarization scoring tools.
Sequence Match Accuracy Python Match the accuracy of two sequences with Hungarian algorithm.

Clustering

Link Language Description
uis-rnn GitHub stars Build Status Python & PyTorch Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization. This clustering algorithm is supervised.
uis-rnn-sml GitHub stars Python & PyTorch A variant of UIS-RNN, for the paper Supervised Online Diarization with Sample Mean Loss for Multi-Domain Data.
DNC GitHub stars Python & ESPnet Transformer-based Discriminative Neural Clustering (DNC) for Speaker Diarisation. Like UIS-RNN, it is supervised.
SpectralCluster GitHub stars Build Status Python Spectral clustering with affinity matrix refinement operations.
sklearn.cluster Build Status Python scikit-learn clustering algorithms.
PLDA GitHub stars Python Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA GitHub stars C++ Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).

Speaker embedding

Link Method Language Description
Speaker_Verification GitHub stars d-vector Python & TensorFlow Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification GitHub stars d-vector Python & PyTorch PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
Real-Time Voice Cloning GitHub stars d-vector Python & PyTorch Implementation of "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" (SV2TTS) with a vocoder that works in real-time.
deep-speaker GitHub stars d-vector Python & Keras Third party implementation of the Baidu paper Deep Speaker: an End-to-End Neural Speaker Embedding System.
x-vector-kaldi-tf GitHub stars x-vector Python & TensorFlow & Perl Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector GitHub stars i-vector C++ & Perl Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector GitHub stars i-vector Perl Voxceleb1 i-vector based speaker recognition system.

Speaker change detection

Link Language Description
change_detection GitHub stars Python & Keras Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks.

Other software

Link Language Description
VB Diarization GitHub stars Build Status Python VB Diarization with Eigenvoice and HMM Priors.

Datasets

Audio Diarization ground truth Language Pricing Additional information
2000 NIST Speaker Recognition Evaluation Disk-6 (Switchboard), Disk-8 (CALLHOME) Multiple $2400.00 Evaluation Plan
2003 NIST Rich Transcription Evaluation Data Together with audios en, ar, zh $2000.00 telephone speech, broadcast news
CALLHOME American English Speech CALLHOME American English Transcripts en $1500.00 + $1000.00 CH109 whitelist
The ICSI Meeting Corpus Together with audios en Free License
The AMI Meeting Corpus Together with audios (need to be processed) Multiple Free License
Fisher English Training Speech Part 1 Speech Fisher English Training Speech Part 1 Transcripts en $7000.00 + $1000.00
Fisher English Training Part 2, Speech Fisher English Training Part 2, Transcripts en $7000.00 + $1000.00

Leaderboards

Other learning materials

Tech blog

Video tutorials

Products

Company Product
Google Google Cloud Speech-to-Text API
Amazon Amazon Transcribe
IBM Watson Speech To Text API
DeepAffects Speaker Diarization API