MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer Official Implementation
This repository is official implementation of MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer.
create env -n metricaug python==3.8
pip install scikit-learn
pip install joblib
pip install pandas
pip install tqdm
pip lnstall librosa
pip install soundfile
pip install fairseq
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install https://github.com/schmiph2/pysepm/archive/master.zip
Please download the dataset which you want to implement on it.
MSP-Pocast: https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html
MELD: https://affective-meld.github.io/
MUSAN:https://www.openslr.org/17/
ESC-50:https://github.com/karolpiczak/ESC-50
If you used these dataset, please reference the corresponding paper from the original author.
If you used MELD, please install ffmpeg and run the following command to extract feature:
ffmpeg -i input.mp4 output.wav
Follow these scripts to generate superset:
preprocessing/add_musan.py
preprocessing/add_esc50_random.py
feature_extract/vqwav2vec_extract_folder_recursive.py
Follow this code to complete the feature extraction.
Follow these scripts to compute the speech distortion metrics (fwSNRseg, stoi and pesq):
preprocessing/metric/compute_se_metric.py
If you have problem in computing pesq, using preprocessing/metric/re_compute_pesq.py
to fix it.
preprocessing/metric/merge_to_parse_meta.py
Once you are done, run this code to merge all noisy data.
preprocessing/metric/se_metric_statistical_by_gmm_metric.py
preprocessing/metric/se_metric_statistical_by_rank_metric.py
Using these two scripts to complete the level clustering for speech distortion metric, the default level is 5.
If you have any questions for code I/O, we made examples in example_meta
, please check the format and file path.
The training code is
train_metric_aug_GRU-TFM_main.py
data_sample_weight.py
shows the algorithm 1 in our paper.
test_musan_0_5_10_aug_GRU-TFM_main.py
test_esc50_GRU-TFM_main.py
They are the code for inference our model, we also provide the best performance in our paper, which are in the folder exp/original_exp/MELD_stoi_gmm
and exp/original_exp/MSP_stoi_gmm
.
TO DO LIST:
- Optimized the code .
- Write a shell bash to make a pipeline.
- Detail the code I/O .