Implement of Improved Speech Enhancement with the Wave-U-Net.
- Python 3.6
- CUDA 10.1 & CuDNN 7.6
- Please choose the appropriate CUDA and CuDNN version to match your NNabla version
Please install the following packages with pip. (If necessary, install latest pip first.)
- nnabla (over v1.1)
- nnabla-ext-cuda (over v1.1)
- scipy
- numba
- joblib
- pyQT5
- pyqtgraph (after installing pyQT5)
- pypesq (see "install with pip" on offical site)
In the latest version, the package name is changed topesq
. If you installpesq
, change 'pypesq' topesq
in 25th line inwave-u-net.py
asfrom pesq import pesq as pypesq
-
wave-u-net.py
Main source code. Run this. -
data.py
This is for creating Batch Data. Before runnning, please download wav dataset as seen below. -
settings.py
This includes setting parameters.
-
Download
wave-u-net.py
,settings.py
,data.py
and save them into the same directory. -
In the directory, make three folders
data
,pkl
,params
.data
folder : save wav data.pickle
folder : save pickled database "~.pkl".params
folder : save parameters including network models.
-
Download the following 4 dataset, and unzip them from
Noisy speech database for training speech enhancement algorithms and TTS models
https://datashare.is.ed.ac.uk/handle/10283/2791[ Direct Links ]
-
Move those unzipped 4 folders into
data
folder. -
Convert the sampling frequency of all the wav data to 16kHz. For example, this site is useful. After converting, you can delete the original wav data.
If train, in wave-u-net.py
,
Train = 1
If predict, in wave-u-net.py
,
Train = 0