/PCS

Perceptual Contrast Stretching on Target Feature for Speech Enhancement (Accepted by INTERSPEECH 2022)

Primary LanguageMATLAB

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

This repo is only dedicated to the post-processing PCS.

catalog

Introduction
PCS-tools
SpeechMetrics-tools
Citation
References

For Speech Enhancement Systems utilizing a 400-sample window frame in the Short-Time Fourier Transform (STFT), we recommend using PCS400 instead of PCS. This adjustment helps prevent distortion due to mismatching.

Introduction

"PCS is derived based on the critical band importance function and applied to modify the targets of the SE model."
"It can also be used as a post-processing (PP) method to further sharpen the structure of enhanced speech and suppress residual noise."

More details can be found in here: http://arxiv.org/abs/2203.17152 (Preprint arXiv; Accepted by INTERSPEECH 2022)

This repo is only dedicated to the post-processing PCS.

Enhanced audios are generated by different baseline models to which post-processing PCS is then applied.
The experimental results are as follows:

Some examples are shown below:

PCS-tools

Post-processing PCS tools can be found at /PCS or PCS400 folder.
So you can simply post-process the audio with PCS.

For Speech Enhancement Systems utilizing a 400-sample window frame in the Short-Time Fourier Transform (STFT), we recommend using PCS400 instead of PCS. This adjustment helps prevent distortion due to mismatching.

Scoring-tools

Speech metric scores were computed with /speech_metrics.

Online Post-processing PCS Demo

https://lojoffy-pcs-online-demo-main-luu0rc.streamlitapp.com/

Citation:

If you find the code useful in your research, please cite:

@article{chao2022perceptual,
  title={Perceptual Contrast Stretching on Target Feature for Speech Enhancement},
  author={Chao, Rong and Yu, Cheng and Fu, Szu-Wei and Lu, Xugang and Tsao, Yu},
  journal={Proc. of INTERSPEECH},
  year={2022}
}

Reference:

SEGAN:

arXiv: https://arxiv.org/pdf/1703.09452.pdf

Wiener filter:

wikipedia: https://en.wikipedia.org/wiki/Wiener_filter

Transformer T(c) / T(nc)

arXiv: https://arxiv.org/pdf/2006.10296.pdf

CRNN

arXiv: https://arxiv.org/pdf/1805.00579.pdf

MetricGAN+

arXiv: https://arxiv.org/pdf/2104.03538.pdf
From SpeechBrain: https://huggingface.co/speechbrain/metricgan-plus-voicebank

DPT-FSNet:

arXiv: https://arxiv.org/pdf/2104.13002.pdf
Reproduced and denoted as DPT*