/triplet_loss_kws

Learning Efficient Representations for Keyword Spotting with Triplet Loss

Primary LanguagePythonMIT LicenseMIT

Learning Efficient Representations for Keyword Spotting with Triplet Loss

Code for the paper Learning Efficient Representations for Keyword Spotting with Triplet Loss
by Roman Vygon(roman.vygon@gmail.com) and Nikolay Mikhaylovskiy(nickm@ntr.ai).

Prerequisites

Training

To train a triplet encoder run:

python TripletEncoder.py --name=test_encoder --manifest=MANIFEST --model=MODEL 

To train a no-triplet model, or to train a classifier based on the triplet encoder run:

python TripletClassifier.py --name=test_classifier --manifest=MANIFEST --model=MODEL

You can use --help to view the description of arguments.

Hardware Requirements

Training was performed on a single Tesla K80 12GB.

Model Batch Size VRAM
Res15 35*4 11GB
Res8 35*10 4GB

Testing

To test a triplet encoder run:

python infer_train.py --name=test_encoder --manifest=MANIFEST --model=MODEL --enc_step=ENCODER_TRAINING_STEP

To test a classifier-head model run:

python infer_notl.py --name=test_encoder --cl_name=test_classifier --manifest=MANIFEST --model=MODEL --enc_step=ENCODER_TRAINING_STEP --cl_step=CLASSIFIER_TRAINING_STEP

You can use --help to view the description of arguments.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Datasets

LibriSpeech

You can download the test-clean-360 here: http://www.openslr.org/12. If the site doesn't load see this code for direct links to the files.

Google Speech Commands

Use this notebook to download and prepare the Google Speech Commands dataset.

Additional files

Data manifests, librispeech alignments and distance measures can be found here. You'll need to update the manifests.json file with the dataset path. You can convert LibriWords manifests with convert_path_prefix.ipynb

The files sadly went missing, I'll try to recover them, if anyone had a chance to download them please contact me.