pyannote.audio
is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.
# 1. visit hf.co/pyannote/speaker-diarization and hf.co/pyannote/segmentation and accept user conditions (only if requested)
# 2. visit hf.co/settings/tokens to create an access token (only if you had to go through 1.)
# 3. instantiate pretrained speaker diarization pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
use_auth_token="ACCESS_TOKEN_GOES_HERE")
# 4. apply pretrained pipeline
diarization = pipeline("audio.wav")
# 5. print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...
For version 2.x of pyannote.audio
, I decided to rewrite almost everything from scratch.
Highlights of this release are:
- 🤯 much better performance (see Benchmark)
- 🐍 Python-first API
- 🤗 pretrained pipelines (and models) on 🤗 model hub
- ⚡ multi-GPU training with pytorch-lightning
- 🎛️ data augmentation with torch-audiomentations
- 💥 Prodigy recipes for model-assisted audio annotation
Only Python 3.8+ is officially supported (though it might work with Python 3.7)
conda create -n pyannote python=3.8
conda activate pyannote
# pytorch 1.11 is required for speechbrain compatibility
# (see https://pytorch.org/get-started/previous-versions/#v1110)
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch
pip install pyannote.audio
- Changelog
- Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
- Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Training a pipeline
- Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
- Blog
- 2022-10-23 > "One speaker segmentation model to rule them all"
- 2021-08-05 > "Streaming voice activity detection with pyannote.audio"
- Miscellaneous
- Training with
pyannote-audio-train
command line tool - Annotating your own data with Prodigy
- Speaker verification
- Visualization and debugging
- Training with
📝 Written in lower case: pyannote.audio
(or pyannote
if you are lazy). Not PyAnnote
nor PyAnnotate
(sic).
📢 Pronounced like the french verb pianoter. pi like in piano, not py like in python.
🎹 pianoter means to play the piano (hence the logo 🤯).
Pretrained pipelines do not produce good results on my data. What can I do?
- Annotate dozens of conversations manually and separate them into development and test subsets in
pyannote.database
. - Optimize the hyper-parameters of the pretained pipeline using the development set. If performance is still not good enough, go to step 3.
- Annotate hundreds of conversations manually and set them up as training subset in
pyannote.database
. - Fine-tune the models (on which the pipeline relies) using the training set.
- Optimize the hyper-parameters of the pipeline using the fine-tuned models using the development set. If performance is still not good enough, go back to step 3.
Out of the box, pyannote.audio
default speaker diarization pipeline is expected to be much better (and faster) in v2.x than in v1.1. Those numbers are diarization error rates (in %)
Dataset \ Version | v1.1 | v2.0 | v2.1.1 (finetuned) |
---|---|---|---|
AISHELL-4 | - | 14.6 | 14.1 (14.5) |
AliMeeting (channel 1) | - | - | 27.4 (23.8) |
AMI (IHM) | 29.7 | 18.2 | 18.9 (18.5) |
AMI (SDM) | - | 29.0 | 27.1 (22.2) |
CALLHOME (part2) | - | 30.2 | 32.4 (29.3) |
DIHARD 3 (full) | 29.2 | 21.0 | 26.9 (21.9) |
VoxConverse (v0.3) | 21.5 | 12.6 | 11.2 (10.7) |
REPERE (phase2) | - | 12.6 | 8.2 ( 8.3) |
This American Life | - | - | 20.8 (15.2) |
If you use pyannote.audio
please use the following citations:
@inproceedings{Bredin2020,
Title = {{pyannote.audio: neural building blocks for speaker diarization}},
Author = {{Bredin}, Herv{\'e} and {Yin}, Ruiqing and {Coria}, Juan Manuel and {Gelly}, Gregory and {Korshunov}, Pavel and {Lavechin}, Marvin and {Fustes}, Diego and {Titeux}, Hadrien and {Bouaziz}, Wassim and {Gill}, Marie-Philippe},
Booktitle = {ICASSP 2020, IEEE International Conference on Acoustics, Speech, and Signal Processing},
Year = {2020},
}
@inproceedings{Bredin2021,
Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
Booktitle = {Proc. Interspeech 2021},
Year = {2021},
}
For commercial enquiries and scientific consulting, please contact me.
The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio
library.
pip install -e .[dev,testing]
pre-commit install
Tests rely on a set of debugging files available in test/data
directory.
Set PYANNOTE_DATABASE_CONFIG
environment variable to test/data/database.yml
before running tests:
PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest