This repository summarizes the papers, codes and tools for single-/multi-channel speech enhancement/speech seperation task, which aims to create a list of open source projects rather than pursuing the completeness of the papers. You are kindly invited to pull requests.
- Speech_Enhancement
- Dereverberation
- Speech_Seperation
- Array_Signal_Processing
- Sound_Event_Detection
- Tools
- Resources
- On Training Targets for Supervised Speech Separation, Wang, 2014. [Paper]
- A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement, Valin, 2018. Paper [RNNoise]
- A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech, Valin, 2020. Paper [PercepNet]
- Other IRM-based SE repositories: [IRM-SE-LSTM] [nn-irm] [rnn-se] [DL4SE]
- An Experimental Study on Speech Enhancement Based on Deep Neural Networks, Xu, 2014. [Paper]
- A Regression Approach to Speech Enhancement Based on Deep Neural Networks, Xu, 2014. [Paper] [sednn] [DNN-SE-Xu] [DNN-SE-Li]
- Other DNN magnitude spectrum mapping-based SE repositories: [SE toolkit] [TensorFlow-SE] [UNetSE]
- Speech enhancement with LSTM recurrent neuralnetworks and its application to noise-robust ASR, Weninger, 2015. [Paper]
- Long short-term memory for speaker generalizationin supervised speech separation, Chen, 2017. [Paper]
- Online Monaural Speech Enhancement using Delayed Subband LSTM, Li, 2020. [Paper]
- FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement, Hao, 2020. [Paper] [FullSubNet]
- A Fully Convolutional Neural Network for Speech Enhancement, Park, 2016. [Paper] [CNN4SE]
- A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement, Tan, 2018. [Paper] [CRN-Tan]
- Convolutional-Recurrent Neural Networks for Speech Enhancement, Zhao, 2018. [Paper] [CRN-Hao]
- Complex spectrogram enhancement by convolutional neural network with multi-metrics learning, Fu, 2017. [Paper]
- Learning Complex Spectral Mapping With GatedConvolutional Recurrent Networks forMonaural Speech Enhancement, Tan, 2020. [Paper] [GCRN]
- Phase-aware Speech Enhancement with Deep Complex U-Net, Choi, 2019. [Paper] [DC-UNet]
- DCCRN: Deep Complex Convolution Recurrent Network for Phase-AwareSpeech Enhancement, Hu, 2020. [Paper] [DCCRN]
- T-GSA: Transformer with Gaussian-Weighted Self-Attention for Speech Enhancement, Kim, 2020. [Paper]
- PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network, Yin, 2019. [Paper] [PHASEN]
- Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising, Williamson, 2017. [Paper]
- Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net, Choi, 2020. [Paper]
- Real Time Speech Enhancement in the Waveform Domain, Defossez, 2020. [Paper] [facebookDenoiser]
- Improved Speech Enhancement with the Wave-U-Net, Macartney, 2018. [Paper] [WaveUNet]
- Monaural speech enhancement through deep wave-U-net, Guimarães, 2020. [Paper] [SEWUNet]
- A New Framework for CNN-Based Speech Enhancement in the Time Domain, Pandey, 2019. [Paper]
- Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis, Ali, 2020. [Paper]
- TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain, Pandey, 2019. [Paper]
- Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in the Time Domain, Pandey, 2020. [Paper] [DDAEC]
- Dense CNN With Self-Attention for Time-Domain Speech Enhancement, Pandey, 2021. [Paper]
- Dual-path Self-Attention RNN for Real-Time Speech Enhancement, Pandey, 2021. [Paper]
- SEGAN: Speech Enhancement Generative Adversarial Network, Pascual, 2017. [Paper] [SEGAN]
- SERGAN: Speech enhancement using relativistic generative adversarial networks with gradient penalty, Deepak Baby, 2019. [Paper] [SERGAN]
- MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement, Fu, 2019. [Paper] [MetricGAN]
- MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement, Fu, 2019. [Paper] [MetricGAN+]
- HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks, Su, 2020. [Paper] [HifiGAN]
- Deep Xi as a Front-End for Robust Automatic Speech Recognition, Nicolson, 2019. [Paper] [DeepXi]
- Deep Residual-Dense Lattice Network for Speech Enhancement, Nikzad, 2020. [Paper] [RDL-SE]
- DeepMMSE: A Deep Learning Approach to MMSE-based Noise Power Spectral Density Estimation, Zhang, 2020. [Paper]
- Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep-Learning-Based Speech Enhancement, Li, 2019. [Paper] [SE-MLC]
- Speech Enhancement Using a DNN-Augmented Colored-Noise Kalman Filter, Yu, 2020. [Paper] [DNN-Kalman]
- A Recursive Network with Dynamic Attention for Monaural Speech Enhancement, Li, 2020. [Paper] [DARCN]
- Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise, Hao, 2020. [Paper]
- A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement, Du, 2020. [Paper]
- Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression, Westhausen, 2020. [Paper] [DTLN]
- Listening to Sounds of Silence for Speech Denoising, Xu, 2020. [Paper] [LSS]
- ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network, Li, 2021. [Paper]
- DNS Challenge [DNS Interspeech2020] [DNS ICASSP2021] [DNS Interspeech2021]
- Collection of papers, datasets and tools on the topic of Speech Dereverberation and Speech Enhancement [Link]
- SPENDRED [Paper] [SPENDRED]
- WPE(MCLP) [Paper][nara-WPE]
- LP Residual [Paper] [LP_residual]
- dereverberate [Paper] [Code]
- Tasnet: time-domain audio separation network for real-time, single-channel speech separation [Code]
- Conv-TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation [Code]
- Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation [Code1] [Code2]
- DANet:Deep Attractor Network (DANet) for single-channel speech separation [Code]
- TAC end-to-end microphone permutation and number invariant multi-channel speech separation [Code]
- uPIT-for-speech-separation:Speech separation with utterance-level PIT [Code]
- LSTM_PIT_Speech_Separation [Code]
- Deep-Clustering [Code] [Code] [Code]
- sound separation(Google) [Code]
- sound separation: Deep learning based speech source separation using Pytorch [Code]
- music-source-separation [Code]
- Singing-Voice-Separation [Code]
- Comparison-of-Blind-Source-Separation-techniques[Code]
- FastICA[Code]
- A localisation- and precedence-based binaural separation algorithm[Download]
- Convolutive Transfer Function Invariant SDR [Code]
- MASP:Microphone Array Speech Processing [Code]
- BeamformingSpeechEnhancer [Code]
- TSENet [Code]
- steernet [Code]
- DNN_Localization_And_Separation [Code]
- nn-gev:Neural network supported GEV beamformer CHiME3 [Code]
- chime4-nn-mask:Implementation of NN based mask estimator in pytorch(reuse some programming from nn-gev)[Code]
- beamformit_matlab:A MATLAB implementation of CHiME4 baseline Beamformit [Code]
- pb_chime5:Speech enhancement system for the CHiME-5 dinner party scenario [Code]
- beamformit:麦克风阵列算法 [Code]
- Beamforming-for-speech-enhancement [Code]
- deepBeam [Code]
- NN_MASK [Code]
- Cone-of-Silence [Code]
- binauralLocalization [Code]
- robotaudition_examples:Some Robot Audition simplified examples (sound source localization and separation), coded in Octave/Matlab [Code]
- WSCM-MUSIC [Code]
- doa-tools [Code]
- Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks [Code] [PDF]
- messl:Model-based EM Source Separation and Localization [Code]
- messlJsalt15:MESSL wrappers etc for JSALT 2015, including CHiME3 [Code]
- fast_sound_source_localization_using_TLSSC:Fast Sound Source Localization Using Two-Level Search Space Clustering [Code]
- Binaural-Auditory-Localization-System [Code]
- Binaural_Localization:ITD-based localization of sound sources in complex acoustic environments [Code]
- Dual_Channel_Beamformer_and_Postfilter [Code]
- 麦克风声源定位 [Code]
- RTF-based-LCMV-GSC [Code]
- DOA [Code]
- sed_eval - Evaluation toolbox for Sound Event Detection [Code]
- Benchmark for sound event localization task of DCASE 2019 challenge [Code]
- sed-crnn DCASE 2017 real-life sound event detection winning method. [Code]
- seld-net [Code]
- APS:A workspace for single/multi-channel speech recognition & enhancement & separation. [Code]
- AKtools:the open software toolbox for signal acquisition, processing, and inspection in acoustics [SVN Code](username: aktools; password: ak)
- espnet [Code]
- asteroid:The PyTorch-based audio source separation toolkit for researchers[PDF][Code]
- pytorch_complex [Code]
- ONSSEN: An Open-source Speech Separation and Enhancement Library [Code]
- separation_data_preparation[Code]
- MatlabToolbox [Code]
- athena-signal [[Code]](https://github.com/athena-team/athena-signal)
- python_speech_features [Code]
- speechFeatures:语音处理,声源定位中的一些基本特征 [Code]
- sap-voicebox [Code]
- Calculate-SNR-SDR [Code]
- RIR-Generator [Code]
- Python library for Room Impulse Response (RIR) simulation with GPU acceleration [Code]
- ROOMSIM:binaural image source simulation [Code]
- binaural-image-source-model [Code]
- PESQ [Code]
- SETK: Speech Enhancement Tools integrated with Kaldi [Code]
- pb_chime5:Speech enhancement system for the CHiME-5 dinner party scenario [Code]