agwe-recipe

This recipe trains acoustic word embeddings (AWEs) and acoustically grounded word embeddings (AGWEs) on paired data consisting of word labels (given by their character sequences) and spoken word segments.

The training objective is based on the multiview triplet loss functions of Wanjia et al., 2016. Hard negative sampling was added in Settle et al., 2019 to improve training speed (similar to src/multiview_triplet_loss_old.py). The current version (see src/multiview_triplet_loss.py) uses semi-hard negative sampling Schroff et al. (instead of hard negative sampling) and includes obj1 from Wanjia et al. in the loss.

Dependencies

python 3, pytorch 1.4, h5py, numpy, scipy

Dataset (for the purpose of TTIC 31110)

Use this link to download the dataset.

Training

Edit train_config.json and run train.sh

./train.sh

Evaluate

Edit eval_config.json and run eval.sh

./eval.sh

Results

With the default train_config.json you should obtain the following results:

acoustic_ap= 0.79

crossview_ap= 0.75

Acknowledgement

This repo is forked from Shane Settle's agwe-recipe repo.