-
This dataset contains a large collection of clean speech files and variety of environmental noise files in .wav format sampled at 16 kHz.
-
The main application of this dataset is to train Deep Neural Network (DNN) models to suppress background noise. But it can be used for other audio and speech applications.
-
We provide the recipe to mix clean speech and noise at various signal to noise ratio (SNR) conditions to generate large noisy speech dataset.
-
The SNR conditions and the number of hours of data required can be configured depending on the application requirements.
-
More Infor: https://github.com/microsoft/MS-SNSD
Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate for methods where noisy speech is processed by a time-frequency (TF) weighting, e.g., noise reduction and speech separation. In this paper, we present an objective intelligibility measure, which shows high correlation (rho=0.95) with the intelligibility of both noisy, and TF-weighted noisy speech. The proposed method shows significantly better performance than three other, more sophisticated, objective measures. Furthermore, it is based on an intermediate intelligibility measure for short-time (approximately 400 ms) TF-regions, and uses a simple DFT-based TF-decomposition. In addition, a free Matlab implementation is provided.
More Infor: https://ieeexplore.ieee.org/document/5495701
This is the repository of the DSEGAN, ISEGAN, (and the baseline SEGAN) in our original paper:
H. Phan, I. V. McLoughlin, L. Pham, O. Y. Chén, P. Koch, M. De Vos, and A. Mertins, "Improving GANs for Speech Enhancement," IEEE Signal Processing Letters, 2020. (accepted)
More Infor: https://github.com/pquochuy/idsegan
- Dataset Preparation 数据集准备
- tensorflow_gpu == 1.9
- numpy== 1.1.3
- scipy== 1.0.0