prototypical-network-audio-evaluation: A Python repository from aliceebaird

A Prototypical Network Approach for Evaluating Generated Emotional Speech

This repository contains code for the INTERSPEECH 2021 paper 📄 "A Prototypical Network Approach for Evaluating Generated Emotional Speech", by Alice Baird, Silvan Mertes, Manuel Milling, Lukas Stappen, Thomas Wiest, Elisabeth André, and Björn W. Schuller.

The prototypical network applied is adapated based on the Py-torch implementation from Snell et al.

Here we include code for the adapted prototypical network, and embedding-space evaluation. We also include augmentation_options.py for the data augmentation methods applied. More detail on SpecAug found here. For audio generation WaveGAN was applied.

Any questions feel free to reach out! 📧 alicebaird@ieee.org

Setup ⚙️

Install a virtualenv, and create and activate a new enviroment

pip3 install virtualenv 
virtualenv .protonets 
source .protonets/bin/activate

Install requirements

pip install -r requirements.txt

Due to data sharing limitations, we share only the WaveGAN generated spectrogram images (based on the original training set). To test the code you can unzip, the archive included here. If you would like the GEMEP sub-set used in the publication get in touch.

Train and Test 🚂

This version of the code has been adapted to run without a GPU.

train network with original spectrograms.

bash train.sh model_name

test the network utilising the generated data.

bash test_spec_gen.sh model_name

Pair-wise embedding space diversity

To utilise the emeddings from all experiments of Baird et al, you can download and place these under embeddings/.
Run embedding_diversity_analysis.py to calculate a average pair-wise distance between two points from source samples and different augmentation techniques for each emotion.
Results are stored as csv-file (with French emotion labels) and as heatmaps (with English emotion label) in the folder result_pairwise_distance/.

Visualise prototypes and embedding space 👀

We also include a script (tsne-plot.py) to visualise more easily the embedding space (using the same embeddings as the previous step).

Citation and Contributors

If you use the code from this repositroy please add the following citation to your paper:

A.Baird, S. Mertes, M. Milling, L. Stappen, T. Wiest, E. André, and B. W. Schuller, “A Prototypical Network Approach for Evaluating Generated Emotional Speech” in Proc. INTERSPEECH 2021. Brno, Czech Republic: ISCA, Sep. 2021, p. [to appear]

@inproceedings{baird2021interspeech,
    title={{A Prototypical Network Approach for Evaluating Generated Emotional Speech}},
    author={Baird, Alice and Mertes, Silvan and Milling,Manuel and Stappen,Lukas and Wiest, Thomas and Andr\'{e}, Elisabeth and Schuller, Bj\"{o}rn W.},
    address={Brno, Czech Republic},
    booktitle={Proc. INTERSPEECH 2021},
    organization={ISCA},
    year={Sep. 2021},
    pages={[to appear]}
}

Thanks to the contributers of this repository 🥰.

_Alice

_Manuel