EACL 2024 - Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations
Download force-aligned dataset (timestamps, word list):
Option 1: From drive : MLS_force_aligned
Option 2:
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/datasets/Trikaldarshi/MLS_AWE
Download corresponding speech corpora: https://www.openslr.org/94/
Note: for english speech corpora, please download the partaa only (due to huge amount of data): https://dl.fbaipublicfiles.com/mls/mls_english_parts_list.txt
If you want to force align the dataset yourself, you may use the following commands to do so via MFA toolkit
Please arrange the datafiles in the required format used in mfa directory structure. You may use the code in python prepare_data.py
with some modification to do that.
conda activaet mfa ## create an environment with MFA toolkit installed
mfa models download acoustic english_mfa
mfa models download dictionary english_us_mfa
mfa align --clean ....your path/MLS_processed/mls_english/train/ english_us_mfa english_mfa ....your path/MLS_force_aligned/mls_english/train/ --output_format=csv --beam 100 --retry_beam 400
mfa align --clean ....your path/MLS_processed/mls_english/dev/ english_us_mfa english_mfa ....your path/MLS_force_aligned/mls_english/dev/ --output_format=csv --beam 100 --retry_beam 400
mfa align --clean ....your path/MLS_processed/mls_english/test/ english_us_mfa english_mfa ....your path/MLS_force_aligned/mls_english/test/ --output_format=csv --beam 100 --retry_beam 400
conda create --name myenv --file spec-file.txt
Already prepared metadata is available at in /metadata folder OR
Use the code python prepare_metadata.py
to get train_metadata.csv, dev_metadata.csv, and test_metadata.csv for all the langauges separately. Change the paths in the code for various languages.
For HuBERT: python extract_ssl.py @config_files/extract_hubert.txt
, For Wav2vec: python extract_ssl.py @config_files/extract_wav2vec2.txt
, For WavLM: python extract_ssl.py @config_files/extract_wavlm.txt
For HuBERT: python extract_ssl_woc.py @config_files/extract_hubert.txt
, For Wav2vec: python extract_ssl_woc.py @config_files/extract_wav2vec2.txt
, For WavLM: python extract_ssl_woc.py @config_files/extract_wavlm.txt
python extract_mfcc.py @config_files/extract_mfcc.txt
For HuBERT: python cae.py @config_files/cae_hubert.txt
, For wav2vec2: python cae.py @config_files/cae_wav2vec2.txt
, For WavLM: python cae.py @config_files/cae_wavlm.txt
, for MFCC: python cae.py @config_files/cae_mfcc.txt
Similarly for AE models.
Change the --metadata_file
path with woc (without context features) and wc (with context) in /config_files/cae** or /config_files/ae**
python eval_awe.py @config_files/eval_awe.txt
Change the --model_weights
and --metadata_file
according to Langauge and Model you want to evalaute for word-discrimination task.
python pooling_eval.py
, please change the metadata_filepath
inside the code as per your need.