/change_detection

Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks

MIT LicenseMIT

Speaker Change Detection using Bi-LSTM

Code for Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks

Citation

@inproceedings{Yin2017,
  Author = {Ruiqing Yin and Herv\'e Bredin and Claude Barras},
  Title = {{Interspeech 2017, 18th Annual Conference of the International Speech Communication Association}},
  Year = {2017},
  Month = {August},
  Address = {Stockholm, Sweden},
  url = {https://github.com/yinruiqing/change_detection},
}

Installation

Foreword: The code is based on pyannote. You can also find a similar Readme file for TristouNet .

$ conda create --name change-detection python=2.7 anaconda
$ source activate change-detection
$ conda install gcc
$ conda install -c yaafe yaafe=0.65
$ pip install "pyannote.audio==0.2.1"
$ pip install pyannote.db.etape

What did I just install?

  • keras (and its theano backend) is used for all things deep. If you want to use GPU, please downgrade Keras version to 1.2.0
  • yaafe is used for MFCC feature extraction in pyannote.audio.
  • pyannote.audio is the core for this project. (Model architecture, optimizer, sequence generator)
  • pyannote.db.etape is the ETAPE plugin for pyannote.database, a common API for multimedia databases and experimental protocols (e.g. train/dev/test sets definition).

Then, edit ~/.keras/keras.json to configure keras with theano backend.

$ cat ~/.keras/keras.json
{
    "image_dim_ordering": "th",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}

About the ETAPE database

To reproduce the experiment, you obviously need to have access to the ETAPE corpus.
It can be obtained from ELRA catalogue.

However, if you own another corpus with "who speaks when" annotations, you can fork pyannote.db.etape and adapt the code to your own database.

Training and evaluation

You can use:

$ pyannote-change-detection -h

to find the usage information.

change detection

Usage:
    pyannote-change-detection train [--database=<db.yml> --subset=<subset>] <experiment_dir> <database.task.protocol>
    pyannote-change-detection evaluate [--database=<db.yml> --subset=<subset> --epoch=<epoch> --min_duration=<min_duration>] <train_dir> <database.task.protocol>
    pyannote-change-detection apply  [--database=<db.yml> --subset=<subset> --threshold=<threshold> --epoch=<epoch>  --min_duration=<min_duration>] <train_dir> <database.task.protocol>
    pyannote-change-detection -h | --help
    pyannote-change-detection --version
...

Example of config file can be found in change_detection/config/. Before doing the training and evaluation, you can clone this project to local directory.

git clone https://github.com/yinruiqing/change_detection.git

Training

$ pyannote-change-detection train --database change_detection/config/db.yml --subset train change_detection/config Etape.SpeakerDiarization.TV

This is the expected output:

Epoch 1/100
62464/62464 [==============================] - 171s - loss: 0.1543 - acc: 0.9669   
Epoch 2/100
62464/62464 [==============================] - 117s - loss: 0.1375 - acc: 0.9692     
Epoch 3/100
62464/62464 [==============================] - 115s - loss: 0.1376 - acc: 0.9691     
...
Epoch 50/100
62464/62464 [==============================] - 112s - loss: 0.0903 - acc: 0.9724  
...
Epoch 98/100
62464/62464 [==============================] - 115s - loss: 0.0837 - acc: 0.9732     
Epoch 99/100
62464/62464 [==============================] - 112s - loss: 0.0839 - acc: 0.9732     
Epoch 100/100
62464/62464 [==============================] - 112s - loss: 0.0840 - acc: 0.9731

Evaluation

$ pyannote-change-detection evaluate --database change_detection/config/db.yml --subset development change_detection/config/train/Etape.SpeakerDiarization.TV Etape.SpeakerDiarization.TV 

This is the expected output:

0         95.720% 36.603%
0.0526316 95.526% 49.018%
0.105263  95.213% 57.660%
...
0.526316  92.396% 83.756%
...
0.894737  88.155% 91.139%
0.947368  87.468% 92.046%
1         86.929% 92.672%

The first column is threshold, the second column is purity and the third column is coverage.