MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement (ICML 2019)

Update !!!

An updated version, MetricGAN+, can be found here: paper and code.

Introduction

MetricGAN is a Generative Adversarial Networks (GAN) based black-box metric scores optimization method. By associating the discriminator (D) with the metrics of interest, MetricGAN can be treated as an iterative process between surrogate loss learning and generator learning as shown in the following figure.

This code (developed with Keras) applies MetricGAN to optimize PESQ or STOI score for Speech Enhancement. It can also be easily extended to optimize other metrics.

For more details and evaluation results, please check out our paper.

The pesq improving process for the voicebank test set is as follows:

Dependencies:

Python 2.7
keras=2.0.9
librosa=0.5.1

Note!

As mentioned in the paper, the input features and activation functions used in table 2 are different from those provided here.
The following codes are created by others:

SpectralNormalizationKeras: SpectralNormalization in Keras
pystoi: stoi calculatuin in python (modified by me)
The PESQ file can only be implemented in Linux environment.

Citation

If you find the code useful in your research, please cite:

@inproceedings{fu2019metricGAN,
  title     = {MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement},
  author    = {Fu, Szu-Wei and Liao, Chien-Feng and Tsao, Yu and Lin, Shou-De},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2019}
}

Contact

e-mail: jasonfu@iis.sinica.edu.tw or d04922007@ntu.edu.tw