This in an implementation of NSNet [1] in PyTorch and PyTorch Lightning. NSNet is a recurrent neural network for single channel speech enhancement. This was implemented as part of my thesis for the Master in Electrical Engineering at Ghent University.
- torch 1.4
- pytorch_lightning 0.7.6
- torchaudio 1.4
- soundfile 0.10.3.post1
A dataset containing both clean speech and corresponding noisy speech (i.e. clean speech with noise added) is required.
Running train_nn.py starts the training.
The train_dir variable should contain the path to a folder containing a clean and a noisy folder, containing the clean WAV files and the noisy WAV files respectively. The filename of a noisy WAV file must be the same as the corresponding clean WAV file, with optionally a suffix added delimited by +, e.g. clean01.wav → clean01+noise.wav
The val_dir follows the same convention, but this folder is used for validation.
Running the test_nn.py file results in the output (denoised) WAV files.
testing_dir should point to a folder with the same structure as train_dir and val_dir.
[1] Y. Xia, S. Braun, C. K. A. Reddy, H. Dubey, R. Cutler, and I. Tashev, “Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement,” arXiv:2001.10601 [cs, eess], Feb. 2020.