This in an implementation of EHNet [1] in PyTorch and PyTorch Lightning. EHNet is a convolutional-recurrent neural network for single channel speech enhancement.
- torch 1.4
- pytorch_lightning 0.7.6
- torchaudio 1.4
- soundfile 0.10.3.post1
A dataset containg both clean speech and corresponding noisy speech (i.e. clean speech with noise added) is required. 3 notebooks are included to generate this dataset from a dataset consisting of clean speech recordings and noise recordings.
Running train_nn.py starts the training.
The train_dir variable should contain the path to a folder containing a clean and a noisy folder, containing the clean WAV files and the noisy WAV files respectively. The filename of a noisy WAV file must be the same as the corresponding clean WAV file, with optionally a suffix added delimited by +, e.g. clean01.wav → clean01+noise.wav
The val_dir follows the same convention, but this folder is used for validation.
Running the test_nn.py file results in the output (denoised) WAV files.
testing_dir should point to a folder with the same structure as train_dir and val_dir.
[1] H. Zhao, S. Zarar, I. Tashev, and C.-H. Lee, "Convolutional-Recurrent Neural Networks for Speech Enhancement," arXiv:1805.00579 [cs, eess], May 2018.