
Keras framework for speech enhancement using relativistic GANs

Primary LanguagePythonMIT LicenseMIT

Keras framework for speech enhancement using relativistic GANs.

Uses a fully convolutional end-to-end speech enhancement system.

Implemetation details of the paper accepted to ICASSP-2019

Deepak Baby and Sarah Verhulst, SERGAN: Speech enhancement using relativistic generative adversarial networks with gradient penalty, IEEE-ICASSP, pp. 106-110, May 2019, Brighton, UK.

This work was funded with support from the EU Horizon 2020 programme under grant agreement No 678120 (RobSpear).


  1. Install tensorflow (tested on Tensorflow v1.13.2) and keras (tested on Keras v2.3.1)
  2. Install tqdm for profiling the training progress
  3. The experiments are conducted on a dataset from Valentini et. al., and are downloaded from here. The following script can be used to download the dataset. Requires sox for converting to 16kHz.
    $ ./download_dataset.sh

Running the model

  1. Prepare data for training and testing the various models. The folder path may be edited if you keep the database in a different folder. This script is to be executed only once and the all the models reads from the same location.

    python prepare_data.py
  2. Running the models. The models available in this repository are listed below. Every implementation offers several cGAN configurations. Edit the opts variable for choosing the cofiguration. The results will be automatically saved to different folders. The folder name is generated from files_ops.py and the foldername automatically includes different configuration options.

    1. run_aecnn.py : Auto-encoder CNN model with L1 loss term (No discriminator)
    2. run_lsgan_se.py : SEGAN with least-squares loss [1]
    3. run_wgan-gp_se.py : GAN model with Wassterstein loss and Gradient Penalty
    4. run_rsgan-gp_se.py : GAN model with relativistic standard GAN with Gradient Penalty
    5. run_rasgan-gp_se.py : GAN model with relativistic average standard GAN with Gradient Penalty
    6. run_ralsgan-gp_se.py: GAN model with relativistic average least-squares GAN with Gradient Penalty
  3. Evaluation on testset is also done together with training. Set TEST_SEGAN = False for disabling testing.


  • This code loads all the data into memory for speeding up training. But if you dont have enough memory, it is possible to read the mini-batches from the disk using HDF5 read. In run_<xxx>.py
    clean_train_data = np.array(fclean['feat_data'])
    noisy_train_data = np.array(fnoisy['feat_data'])
    change the above lines to
    clean_train_data = fclean['feat_data']
    noisy_train_data = fnoisy['feat_data']
    But this can lead to a slow-down of about 20 times (on the test machine) as the mini-batches are to be read from the disk over several epochs.


[1] S. Pascual, A. Bonafonte, and J. Serra, SEGAN: speech enhancement generative adversarial network, in INTERSPEECH., ISCA, Aug 2017, pp. 3642–3646.


The keras implementation of cGAN is based on the following repos