speech compensation (BWE & PLC) based on deep neural networks /**********************************************************************************************************************************/
This document includes brief instructions on how to build an NN-based processing chain for speech compensation.
- Author: Yupeng Shi, Nengheng Zheng, Yuyong Kang, and Weicong Rong
- e-mail: szucieer@gmail.com
- Date: 05/12/2019
/*********************************************************************************************************************************/
This project is a Python implementation of Speech Compensation (SC) based on deep neural networks plus extra pre- and post-processing.
tensorflow_gpu 0.12.1
librosa 0.6.3 or later
Soundfile 0.10.2 or later
numpy 1.12.1
scipy 0.18.1 or later
tensorboardX 1.2 or later
python 2.7
We suggest to install anaconda3 in Linux, and you can install those dependences by conda or pip.
a) In this project, you can train the models with the impaired sampels simulated by Low-pass filters and OPUS codec . The sample rate of the NN-based SC system is 16kHz.
b) Some python scripts in are designed for processing the raw data. For generating narrowband signal, Please refer to boneloss_lowpass.py in ./GANSC. As for obtaining pack loss simulation, Opus codec codes can be download from GitHub (https://github.com/xiph/opus). Besides, the ITU-T Software Tool Library (G.191) ((https://github.com/openitu/STL)) can also be implemented to generate telephone transmitted narrowband speech.
1.2 preprocessing the wav files and storing the features in .tfrecords files for GANs while .h5 for DNN
To accelerate the whole NN training, parallel computing have been adopted in the data preprocessing. Short Time Fourier Transform (STFT) or waveform chunks are extracted for NN input. More details can be referred to ./GANSC/make_tfrecords.py and ./DNNSC/prepare_data.py.
-
Put the .tfrecords or .h5 files to the specific path where ./GANSC/data_loader.py or ./DNNSC/main_dnn.py can load the training or validating data for training and validating the NN-based SC model;
-
training a NN-based SC model with specific hyperparameter:
quick start: $ ./GANSC/train_gan.sh or ./DNNSC/runme.sh If you want to modify the hyperparameters, see the help information by: $ python train.py --help
-
Put the testing wav files in the specific path;
-
Set the required parameters for testing. e.g.,
$ bash ./GANSC/clean_wav.sh or ./DNNSC/runme.sh
You can visualize the train process using tensorboard:
$ cd projcet_path
$ tensorboard --logdir=$log_path
and then ,open the browser and enter: IP:6006
An example for remote server, if you are training NN in local PC, the IP can be localhost:
e.g.,
10.10.88.47:6006
localhost:6006
In this project, the GANs struction is modified from the proposed model by Santiago et al.. If you find it useful to your research, pleade cite the following papers:
[1] Y. Xu, JunDu, L. R. Dai, and C. H. Lee, "An Experimental Study on Speech Enhancement Based on Deep Neural Networks," IEEE signal processing letters, pp. 65-68,vol.21,no. 1, JaN. 2014.
[2] S. Pascual, A. Bonafonte, and J. Serrà, “SEGAN: Speech enhancement generative adversarial network,” In INTERSPEECH, 2017.
[3] Y. P. Shi, N. H. Zheng, Y. Y. Kang, and W. C. Rong, "Speech Loss Compensation by Generative Adversarial Networks," In APSIPA, 2019.