TAPLoss
Official Pytorch Implementation of TAPLoss: A Temporal Acoustic Parameter Loss For Speech Enhancement accepted to ICASSP 2023.
Prerequisites
TAPLoss requires Numpy and Pytorch, install versions that are compatible with your project. Refer to Demucs and FullSubNet for installing required packages for Demucs and FullSubNet.
Dataset
The TAP esimator is trained on the DNS 2020 dataset. Download the DNS Interspeech 2020 dataset from here and follow the instructions to prepare the dataset.
Temporal Acoustic Parameters
Ground truth temporal acoustic parameters are calculated using Opensmile. The 25 temporal acoustic parameters include:
- Loudness_sma3
- alphaRatio_sma3
- hammarbergIndex_sma3
- slope0-500_sma3
- slope500-1500_sma3
- spectralFlux_sma3
- mfcc1_sma3
- mfcc2_sma3
- mfcc3_sma3
- mfcc4_sma3
- F0semitoneFrom27.5Hz_sma3nz
- jitterLocal_sma3nz
- shimmerLocaldB_sma3nz
- HNRdBACF_sma3nz
- logRelF0-H1-H2_sma3nz
- logRelF0-H1-A3_sma3nz
- F1frequency_sma3nz
- F1bandwidth_sma3nz
- F1amplitudeLogRelF0_sma3nz
- F2frequency_sma3nz
- F2bandwidth_sma3nz
- F2amplitudeLogRelF0_sma3nz
- F3frequency_sma3nz
- F3bandwidth_sma3nz
- F3amplitudeLogRelF0_sma3nz
Use TAPLoss in your own project
- You'll need the
TAPLoss.py
andTAP_estimator.py
located underTAPLoss/
, you'll also find a.pt
file under the same directory, which is the pretrained TAP estimator model checkpoint.
from TAPLoss import AcousticLoss
- Initialize TAPLoss
TAPloss = AcousticLoss(loss_type, acoustic_model_path, paap, paap_weight_path)
# loss_type: choose one from ["l2", "l1", "frame_energy_weighted_l2", "frame_energy_weighted_l1"]
# acoustic_model_path: path to the .pt TAP pretrained estimator model checkpoint.
# paap: set to True if you want to use Paaploss, default is False.
# paap_weight_path: path to the paap weights .npy file, must specify if paap == True.
- Call TAPloss as a function
loss = TAPloss(clean_waveform, enhan_waveform, mode)
# clean_waveform: waveform of clean audio, sampled at 16khz.
# enhan_waveform: waveform of enhanced audio, sampled at 16khz.
# mode: "train" if your enhancement model is in train mode,
# "eval" if your enhancement model is in eval mode,
# gradients won't be calculated in eval mode.
Related Resources
More details about the official implementation of PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement can be found at https://github.com/muqiaoy/PAAP.
Citation
Please cite our paper if you use this code for your research.
@article{zeng2023taploss,
title={TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement},
author={Zeng, Yunyang and Konan, Joseph and Han, Shuo and Bick, David and Yang, Muqiao and Kumar, Anurag and Watanabe, Shinji and Raj, Bhiksha},
journal={arXiv preprint arXiv:2302.08088},
year={2023}