A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
This is a curated list of awesome Speech Enhancement tutorials, papers, libraries, datasets, tools, scripts and results. The purpose of this repo is to organize the world’s resources for speech enhancement, and make them universally accessible and useful.
This repo is jointly contributed by Nana Hou (Nanyang Technoligical University), Meng Ge, Hao Shi (Tianjin University), Chenglin Xu (National University of Singapore), Chen Weiguang (Hunan University).
To add items to this page, simply send a pull request.
Publications
Coming soon...
Survey
A literature survey on single channel speech enhancement, 2020 [paper]
Research Advances and Perspectives on the Cocktail Party Problem and Related Auditory Models, Bo Xu, 2019 [paper (Chinese)]
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments, Zixing Zhang, 2017 [paper]
Supervised speech separation based on deep learning: An Overview, 2017 [paper]
A review on speech enhancement techniques, 2015 [paper]
Nonlinear speech enhancement: an overview, 2007 [paper]
Feature augmentation
Speech enhancement using self-adaptation and multi-head attention, ICASSP 2020 [paper]
PAN: phoneme-aware network for monaural speech enhancement, ICASSP 2020 [paper]
This corpus is from REVERB 2014 chanllenge. The challenge assumes the scenario of capturing utterances spoken by a single stationary distant-talking speaker with 1-channel (1ch), 2-channel (2ch) or 8-channel (8ch) microphone-arrays in reverberant meeting rooms. It features both real recordings and simulated data, a part of which simulates the real recordings.
Diverse Environments Multichannel Acoustic Noise Database provides a set of recordings that allow testing of algorithms using real-world noise in a variety of settings.
The noise bank for simulate noisy data with clean speech. For N1-N100 noises, they were collected by Guoning Hu and the other 15 home-made noise types by USTC.
A database of simulated and real room impulse responses, isotropic and point-source noises. The audio files in this data are all in 16k sampling rate and 16-bit precision.This data includes all the room impulse responses (RIRs) and noises we used in our paper "A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition" submitted to ICASSP 2017. It includes the real RIRs and isotropic noises from the RWCP sound scene database, the 2014 REVERB challenge database and the Aachen impulse response database (AIR); the simulated RIRs generated by ourselves and also the point-source noises that extracted from the MUSAN corpus.
SOTA results
STOA results in dataset by University of Edinburgh. The following methods are all trained by "trainset_28spk" and tested by common testset. ("F" denotes frequency-domain and "T" is time-domain.)