jinglescode/papers

Parallel wavenet: Fast high-fidelity speech synthesis

jinglescode opened this issue · 0 comments

Paper

Link: https://arxiv.org/pdf/1711.10433.pdf
Year: 2017

Summary

  • high-fidelity speech synthesis based on WaveNet using Probability Density Distillation

Contributions and Distinctions from Previous Works

  • generating high-fidelity speech samples at more than 20 times faster than real-time compared to the original WaveNet with no significant difference in quality

Methods

  • modify WaveNet for parallel training with inverse-autoregressive flows
  • uses an already trained WaveNet as a ‘teacher’ from which a parallel WaveNet ‘student’ can efficiently learn
  • 'student' cooperates by attempting to match the teacher’s probabilities
  • to minimise the KL-divergence between its distribution and that of the teacher by maximising the log-likelihood of its samples under the teacher and maximising its own entropy at the same time
  • introduce 3 loss terms, power loss, perceptual loss, contrastive loss

Results

  • used in Google Assistant queries
  • modelling a sample rate of 24kHz instead of 16kHz