Awesome Speech Enhancement

Overview
Publications
Datasets
Tools
SOTA results
Learning materials

Overview

This is a curated list of awesome Speech Enhancement tutorials, papers, libraries, datasets, tools, scripts and results. The purpose of this repo is to organize the world’s resources for speech enhancement, and make them universally accessible and useful.

To add items to this page, simply send a pull request. (contributing guide)

Publications

To be categorized

A literature survey on single channel speech enhancement, 2020
A review on speech enhancement techniques, 2015
Nonlinear speechenhancement: an overview, 2007
A Regression Approach to Speech Enhancement Based on Deep Neural Networks, TASLP 2013 [code]
IRM-based-Speech-Enhancement-using-LSTM [Code]
nn-irm [Code]
Speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy [Code][PDF]
SETK: Speech Enhancement Tools integrated with Kaldi [Code]
sednn:deep_learning_for_speech_enhancement_keras_python [Code]
Speech_Enhancement_DNN_NMF [Code]
Deep-Learning-for-Speech-Enhancement [Code]
gcc-nmf:Real-time GCC-NMF Blind Speech Separation and Enhancement [Code]
TensorFlow-speech-enhancement-Chinese [Code]
DNN-Speech-enhancement-demo-tool [Code]
CNN-for-single-channel-speech-enhancement [Code]
rnn-speech-denoising [Code]
DNN-SpeechEnhancement [Code]
segan_pytorch [Code]
PHASEN[Code]
TCNSE [Code]
pb_chime5:Speech enhancement system for the CHiME-5 dinner party scenario [Code]

tools

Framework

Link	Language	Description
SETK	Python & C++	SETK: Speech Enhancement Tools integrated with Kaldi.
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.
Beamformer	Python	Implementation of the mask-based adaptive beamformer (MVDR, GEVD, MCWF).
Time-frequency Mask	Python	Computation of the time-frequency mask (PSM, IRM, IBM, IAM, ...) as the neural network training labels.
SSL	Python	Implementation of Sound Source Localization.
Data format	Python	Format tranform between Kaldi, Numpy and Matlab.

Evaluation

Link	Language	Description
PESQ etc.	Matlab	Evaluation for PESQ, CSIG, CBAK, COVL, STOI
SNR, LSD	Python	Evaluation for signal-to-noise-ratio and log-spectral-distortion.
SDR	Matlab	Evaluation for signal-to-distortion-ratio.

Audio feature extraction

Link	Language	Description
LPS	Python	Extract log-power-spectrum/magnitude spectrum/log-magnitude spectrum/Cepstral mean and variance normalization.
MFCC	Python	This library provides common speech features for ASR including MFCCs and filterbank energies.
pyAudioAnalysis	Python	Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications.

Audio data augmentation

Link	Language	Description
Data simulation	Python	Add reverberation, noise or mix speaker.
RIR simulation	Python	Generation of the room impluse response (RIR) using image method.
pyroomacoustics	Python	Pyroomacoustics is a package for audio signal processing for indoor applications.
gpuRIR	Python	Python library for Room Impulse Response (RIR) simulation with GPU acceleration
rir_simulator_python	Python	Room impulse response simulator using python

Datasets

Speech ehancement datasets (sorted by usage frequency in paper)

Name	Utterances	Speakers	Language	Pricing	Additional information
Dataset by University of Edinburgh (2016)	35K+	86	English	Free	Noisy speech database for training speech enhancement algorithms and TTS models.
TIMIT (1993)	6K+	630	English	$250.00	The TIMIT corpus of read speech is one of the earliest speaker recognition datasets.
VCTK (2009)	43K+	109	English	Free	Most were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent.
WSJ0 (1993)	--	149	English	$1500	The WSJ database was generated from a machine-readable corpus of Wall Street Journal news text.
LibriSpeech (2015)	292K	2K+	English	Free	Large-scale (1000 hours) corpus of read English speech.
CHiME series (~2020)	--	--	English	Free	The database is published by CHiME Speech Separation and Recognition Challenge.

Augmentation noise sources (sorted by usage frequency in paper)

Name	Noise types	Pricing	Additional information
DEMAND (2013)	18	Free	Diverse Environments Multichannel Acoustic Noise Database provides a set of recordings that allow testing of algorithms using real-world noise in a variety of settings.
115 Noise (2015)	115	Free	The noise bank for simulate noisy data with clean speech. For N1-N100 noises, they were collected by Guoning Hu and the other 15 home-made noise types by USTC.
NoiseX-92 (1996)	15	Free	Database of recording of various noises available on 2 CDROMs.

SOTA results

STOA results in dataset by University of Edinburgh. The following methods are all trained by "trainset_28spk" and tested by common testset. ("F" denotes frequency-domain and "T" is time-domain.)

Methods	Publish	Domain	PESQ	CSIG	CBAK	COVL	SegSNR	STOI
Noisy	--	--	1.97	3.35	2.44	2.63	1.68	0.91
Wiener	--	--	2.22	3.23	2.68	2.67	5.07	--
SEGAN	INTERSPEECH 2017	T	2.16	3.48	2.94	2.80	7.73	0.93
CNN-GAN	APSIPA 2018	F	2.34	3.55	2.95	2.92	--	0.93
WaveUnet	arxiv 2018	T	2.40	3.52	3.24	2.96	9.97	--
WaveNet	ICASSP 2018	T	--	3.62	3.24	2.98	--	--
U-net	ISMIR 2017	F	2.48	3.65	3.21	3.05	9.34	--
MSE-GAN	ICASSP 2018	F	2.53	3.80	3.12	3.14	--	0.93
DFL	INTERSPEECH 2019	T	--	3.86	3.33	3.22	--	--
DFL reimplemented	ICLR 2019	T	2.51	3.79	3.27	3.14	9.86	--
TasNet	TASLP 2019	T	2.57	3.80	3.29	3.18	9.65	--
MDPhD	arxiv 2018	T&F	2.70	3.85	3.39	3.27	10.22	--
Complex U-net	INTERSPEECH 2019	F	3.24	4.34	4.10	3.81	16.85	--
Complex U-net reimplemented	arxiv 2019	F	2.87	4.12	3.47	3.51	9.96	--
SDR-PRSQ	arxiv 2019	F	3.01	4.09	3.54	3.55	10.44
RHRnet	ICASSP 2020	T	3.20	4.37	4.02	3.82	14.71	0.98

Learning materials

Book or thesis

A Study on WaveNet, GANs and General CNNRNN Architectures, 2019 [link]
Deep learning: method and applications, 2016 [link]
Deep learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016 [link]
Robust automatic speech recognition by Jinyu Li and Li Deng, 2015 [link]

Video

CCF speech seminar 2020 [link]
Real-time Single-channel Speech Enhancement with Recurrent Neural Networks by Microsoft Research, 2019 [link]
Deep learning in speech by Hongyi Li, 2019 [link]
High-Accuracy Neural-Network Models for Speech Enhancement, 2017 [link]
DNN-Based Online Speech Enhancement Using Multitask Learning and Suppression Rule Estimation, 2015 [link]
Microphone array signal processing: beyond the beamformer,2011 [link]

Slides

Deep learning in speech by Hongyi Li, 2019 [link]
Learning-based approach to speech enhancement and separation (INTERSPEECH tutorial, 2016) [link]
Deep learning for speech/language processing (INTERSPEECH tutorial by Li Deng, 2015) [link]
Speech enhancement algorithms (Stanford University, 2013) [link]

terryjx/Awesome-Speech-Enhancement