An implementation of paper ''A Fully Convolutional Neural Network for Complex Spectrogram Processing in Speech Enhancement''. The code provides a variation on training with TED-LIUM dataset.
- Python 3
- Tensorflow 1.15
- Scipy
- python_speech_features
- Librosa
- Training: for training with TED-LIUM dataset, run train_tedlium.py, otherwise run train.py
- Testing: The testing part, including generating noisy speech and obtaining estimated clean speech from the model, is already a part of the training code. For testing with existing noisy wav files: run proc_existing_noisy.py
- WaveNet
- TED-LIUM (speech corpus)
- ESC-50 (environmental sound dataset / noise dataset)
-
This model
@inproceedings{ouyang2019fully, title={A fully convolutional neural network for complex spectrogram processing in speech enhancement}, author={Ouyang, Zhiheng and Yu, Hongjiang and Zhu, Wei-Ping and Champagne, Benoit}, booktitle={ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={5756--5760}, year={2019}, organization={IEEE} }
-
TED-LIUM (v2)
@inproceedings{rousseau2014enhancing, title={Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks.}, author={Rousseau, Anthony and Del{\'e}glise, Paul and Esteve, Yannick}, booktitle={LREC}, pages={3935--3939}, year={2014} }
-
ESC-50
@inproceedings{piczak2015esc, title={ESC: Dataset for environmental sound classification}, author={Piczak, Karol J}, booktitle={Proceedings of the 23rd ACM international conference on Multimedia}, pages={1015--1018}, year={2015} }
- Remember to modify config.json with your own setup.
- In this implementation, the model is slightly modified from the one in paper.
- Migrate to Tensorflow 2
- A detailed README
- Support variant sampling rate