/DiaPer

Primary LanguagePythonMIT LicenseMIT

DiaPer 🩲

PyTorch implementation for DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors.

Usage

Getting started

We recommend to create an anaconda environment

conda create -n DiaPer python=3.7
conda activate DiaPer

Clone the repository

git clone https://github.com/BUTSpeechFIT/DiaPer.git

Install the packages

conda install pip
pip install git+https://github.com/fnlandini/transformers
conda install numpy
conda install -c conda-forge tensorboard
pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install safe_gpu
pip install yamlargparse==1.31.1
pip install scikit-learn==1.0.2
pip install decorator==5.1.1
pip install librosa==0.9.1
pip install setuptools==59.5.0
pip install h5py==3.8.0
pip install matplotlib==3.5.3

Other versions might work but these were the settings used for this work.

Run the example

./run_example.sh

If it works, you should be set.

Train

To run the training you can call:

    python diaper/train.py -c examples/train.yaml

Note that in the example you need to define the train and validation data directories as well as the output directory. The rest of the parameters are standard ones, as used in our publication. For adaptation or fine-tuning, the process is similar:

    python diaper/train.py -c examples/finetune_adaptedmorespeakers.yaml

In that case, you will need to provide the path where to find the trained model that you want to adapt/fine-tune.

Inference

To run the inference, you can call:

    python diaper/infer.py -c examples/infer.yaml

Note that in the example you need to define the data, model and output directories.

Or, if you want to only evaluate one file:

    python diaper/infer_single_file.py -c examples/infer.yaml --wav-dir <directory with wav file> --wav-name <filename without extension>

Note that in the example you need to define the model and output directories.

Inference with pre-trained models

You can also run inference using the models we share. Either with the usual approach or a single file like:

python diaper/infer_single_file.py -c examples/infer_16k_10attractors.yaml --wav-dir examples --wav-name IS1009a

for the model trained on simulated conversations (no fine-tuning) or with fine-tuning as:

python diaper/infer_single_file.py -c examples/infer_16k_10attractors_AMIheadsetFT.yaml --wav-dir examples --wav-name IS1009a

You should obtain results as in examples/IS1009a_infer_16k_10attractors.rttm and examples/IS1009a_infer_16k_10attractors_AMIheadsetFT.rttm respectively.

All models trained on publicly available and free data are shared inside the folder models. Both families of models with 10 and 20 attractors are available. If you want to use any of them, modify the infer files above to suit your needs. You will need to change models_path and epochs (and rttms_dir, where the output will be generated) to use the model you want.

Results

10 attractors 10 attractors 20 attractors 20 attractors VAD+VBx+OSD
DER and RTTMs without FT with FT without FT with FT ---
AISHELL-4 48.21% 📁 41.43% 📁 47.86% 📁 31.30% 📁 15.84% 📁
AliMeeting (far) 38.67% 📁 32.60% 📁 34.35% 📁 26.27% 📁 28.84% 📁
AliMeeting (near) 28.19% 📁 27.82% 📁 23.90% 📁 24.44% 📁 22.59% 📁
AMI (array) 57.07% 📁 49.75% 📁 52.29% 📁 50.97% 📁 34.61% 📁
AMI (headset) 36.36% 📁 32.94% 📁 35.08% 📁 30.49% 📁 22.42% 📁
Callhome 14.86% 📁 13.60% 📁 -- -- 13.62% 📁
CHiME6 78.25% 📁 70.77% 📁 77.51% 📁 69.94% 📁 70.42% 📁
DIHARD 2 43.75% 📁 32.97% 📁 44.51% 📁 31.23% 📁 26.67% 📁
DIHARD 3 full 34.21% 📁 24.12% 📁 34.82% 📁 22.77% 📁 20.28% 📁
DipCo 48.26% 📁 -- 43.37% 📁 -- 49.22% 📁
Mixer6 21.03% 📁 13.41% 📁 18.51% 📁 10.99% 📁 35.60% 📁
MSDWild 35.69% 📁 15.46% 📁 25.07% 📁 14.59% 📁 16.86% 📁
RAMC 38.05% 📁 21.11% 📁 32.08% 📁 18.69% 📁 18.19% 📁
VoxConverse 23.20% 📁 -- 22.10% 📁 -- 6.12% 📁

Citation

In case of using the software, referencing results or finding the repository useful in any way please cite:

@article{landini2023diaper,
  title={DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors},
  author={Landini, Federico and Diez, Mireia and Stafylakis, Themos and Burget, Luk{\'a}{\v{s}}},
  journal={arXiv preprint arXiv:2312.04324},
  year={2023}
}

If you did not use it for a publication but still found it useful, also let me know by email, I would love to know too :)

Contact

If you have comments or questions, please contact me at landini@fit.vutbr.cz