This repository is code used in our paper:
Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization Prachi Singh, Sriram Ganapathy
The recipe consists of:
- extracting x-vectors using pretrained model
- Performing self-supervised clustering for diarization on AMI and Callhome set
- Evaluating results using Diarization error rate (DER)
The following packages are required to run the baseline.
- clone the repository:
git clone https://github.com/iiscleap/SSC.git
- Install Kaldi. If you are a Kaldi novice, please consult the following for additional documentation:
- Go to cloned repository and copy kaldi path in
path.sh
given as:
$ local_dir="Full_path_of_cloned_repository"
$ echo "export KALDI_ROOT="/path_of_kaldi_directory/kaldi" >> $local_dir/path.sh
$ echo "export KALDI_ROOT="/path_of_kaldi_directory/kaldi" >> $local_dir/tools_dir/path.sh
- Create Softlinks of necessary directories:
$ local_dir="Full_path_of_cloned_repository"
$ cd $local_dir/tools_dir
$ . ./path.sh
$ ln -sf $KALDI_ROOT/egs/wsj/s5/utils . # utils dir
$ ln -sf $KALDI_ROOT/egs/wsj/s5/steps . # steps dir
- For Callhome set : Input x-vectors features are obtained using Kaldi Callhome-diarization. Pre-trained x-vector model and plda model including global mean and PCA transform needed for training are given in
tools_diar/callhome_xvector_models
- For AMI Set : Input x-vectors features are obtained using [DIHARD-II Evaluation Repo] (https://github.com/iiscleap/DIHARD_2019_baseline_alltracks/tree/master/data). Pre-trained x-vector model and plda model including global mean and PCA transform needed for training are given in [
tools_diar/AMI_xvector_models
]. - Performance is evaluated using dscore. Download all the required dependencies in the same python environment.
- This step is to run kaldi diarization pipeline till x-vector extraction using pre-trained model
- Additionally it will convert x-vectors in ark format into numpy format to run in pytorch. It will also convert kaldi plda model into pickle format.
For Callhome - Replace "data_root" with path of callhome dataset in
tools_diar/run_extract_xvectors.sh
- Run following commands:
$ local_dir="Full_path_of_cloned_repository"
$ cd $local_dir/tools_diar
$ bash run_extract_xvectors.sh
For AMI
- Replace "data_root" with path of AMI dataset in
tools_diar/run_extract_xvectors_ami.sh
- Run following commands:
$ local_dir="Full_path_of_cloned_repository"
$ cd $local_dir/tools_diar
$ bash run_extract_xvectors_ami.sh
- xvec_SSC_train.py is code for SSC training
- run_xvec_ssc_ch.sh calls SSC training script for callhome set
- run_xvec_ssc_ami.sh calls SSC training script for AMI set
- Update training parameters in the respective scripts
NOTE that by default Kaldi scripts are configured for execution on a grid using a submission engine such as SGE or Slurm. If you are running the recipes on a single machine, make sure to edit
cmd.sh
andtools_dir/cmd.sh
so that the line
$ export train_cmd="queue.pl"
reads
$ export train_cmd="run.pl"
- Execute following commands:
$ local_dir="Full_path_of_cloned_repository"
$ cd $local_dir
$ bash run_xvec_ssc_ch.sh $local_dir --TYPE parallel --nj <number of jobs> --which_python <python_env_with_all_installed_libraries> # for callhome
$ bash run_xvec_ssc_ami.sh $local_dir --TYPE parallel --nj <number of jobs> --which_python <python_env_with_all_installed_libraries> # for ami
Note: --TYPE parallel (when running multiple jobs simultaneoulsy)
- Diarization Error Rate is used as performance metric
- Scripts in dscore generates filewise DER.
- Go to cloned repo and run following command for evaluation
$ local_dir="Full_path_of_cloned_repository"
$ cd $local_dir
$ find $local_dir/rttm_ground/*.rttm > ref_callhome.scp
$ cd tool_diar/
$ bash gen_rttm.sh --DATA <Callhome/Ami> --stage <1/2> --modelpath <path of model to evaluate> --which_python <python_env_with_all_installed_libraries>
Note: --stage 1 (using ground truth number of speakers), --stage 2 (using threshold based number of clusters)
- Generates
der.scp
in modelpath which contains filewise DER and other metric like JER.
If you are using the resource, please cite as follows:
@ARTICLE{9415169,
author={Singh, Prachi and Ganapathy, Sriram},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization},
year={2021},
volume={29},
pages={1639-1649},
doi={10.1109/TASLP.2021.3075100}}
If you have any comment or question, please contact prachisingh@iisc.ac.in