/awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

Apache License 2.0Apache-2.0

Awesome Speaker Diarization Awesome Contribution

Table of contents

Overview

This is a curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

The purpose of this repo is to organize the world’s resources for speaker diarization, and make them universally accessible and useful.

To add items to this page, simply send a pull request.

Publications

2019

2018

2017

2016

2015

2014

2013

2011

2010

2009

2008

2006

Software

Framework

Link Language Description
SIDEKIT for diarization (s4d) Python An open source package extension of SIDEKIT for Speaker diarization.
pyAudioAnalysis Python Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
AaltoASR Python & Perl Speaker diarization scripts, based on AaltoASR.
LIUM_SpkDiarization Java LIUM_SpkDiarization is a software dedicated to speaker diarization (i.e. speaker segmentation and clustering). It is written in Java, and includes the most recent developments in the domain (as of 2013).
kaldi-asr Build Status Bash Example scripts for speaker diarization on a portion of CALLHOME used in the 2000 NIST speaker recognition evaluation.
Alize LIA_SpkSeg C++ ALIZÉ is an opensource platform for speaker recognition. LIA_SpkSeg is the tools for speaker diarization.
pyannote-audio Python Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding.
pyBK Python Speaker diarization using binary key speaker modelling. Computationally light solution that does not require external training data.
Speaker-Diarization Python Speaker diarization using uis-rnn and GhostVLAD. An easier way to support openset speakers.

Evaluation

Link Language Description
pyannote-metrics Build Status Python A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems.
SimpleDER Build Status Python A lightweight library to compute Diarization Error Rate (DER).
modified NIST md-eval.pl Perl From Mary Tai Knox
NIST md-eval-v21.pl Perl From jitendra
NIST md-eval-22.pl Perl From nryant
dscore Python & Perl Diarization scoring tools.
Sequence Match Accuracy Python Match the accuracy of two sequences with Hungarian algorithm.

Clustering

Link Language Description
uis-rnn Build Status Python & PyTorch Google's Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, for Fully Supervised Speaker Diarization.
SpectralCluster Build Status Python Spectral clustering with affinity matrix refinement operations.
sklearn.cluster Build Status Python scikit-learn clustering algorithms.
PLDA Python Probabilistic Linear Discriminant Analysis & classification, written in Python.
PLDA C++ Open-source implementation of simplified PLDA (Probabilistic Linear Discriminant Analysis).

Speaker embedding

Link Method Language Description
Speaker_Verification d-vector Python & TensorFlow Tensorflow implementation of generalized end-to-end loss for speaker verification.
PyTorch_Speaker_Verification d-vector Python & PyTorch PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al. With UIS-RNN integration.
x-vector-kaldi-tf x-vector Python & TensorFlow & Perl Tensorflow implementation of x-vector topology on top of Kaldi recipe.
kaldi-ivector i-vector C++ & Perl Extension to Kaldi implementing the standard i-vector hyperparameter estimation and i-vector extraction procedure.
voxceleb-ivector i-vector Perl Voxceleb1 i-vector based speaker recognition system.

Other software

Link Language Description
VB Diarization Build Status Python VB Diarization with Eigenvoice and HMM Priors.

Datasets

Audio Diarization ground truth Language Pricing Additional information
2000 NIST Speaker Recognition Evaluation Disk-6 (Switchboard), Disk-8 (CALLHOME) Multiple $2400.00 Evaluation Plan
2003 NIST Rich Transcription Evaluation Data Together with audios en, ar, zh $2000.00 telephone speech, broadcast news
CALLHOME American English Speech CALLHOME American English Transcripts en $1500.00 + $1000.00 CH109 whitelist
The ICSI Meeting Corpus Together with audios en Free License
The AMI Meeting Corpus Together with audios (need to be processed) Multiple Free License
Fisher English Training Speech Part 1 Speech Fisher English Training Speech Part 1 Transcripts en $7000.00 + $1000.00
Fisher English Training Part 2, Speech Fisher English Training Part 2, Transcripts en $7000.00 + $1000.00

Leaderboards

Other learning materials

Tech blog

Video tutorials

Products

Company Product
Google Google Cloud Speech-to-Text API
Amazon Amazon Transcribe
IBM Watson Speech To Text API
DeepAffects Speaker Diarization API