Parallel wavenet: Fast high-fidelity speech synthesis

Question

jinglescode opened this issue 5 years ago · 0 comments

Paper

high-fidelity speech synthesis based on WaveNet using Probability Density Distillation

generating high-fidelity speech samples at more than 20 times faster than real-time compared to the original WaveNet with no significant difference in quality

modify WaveNet for parallel training with inverse-autoregressive flows
uses an already trained WaveNet as a ‘teacher’ from which a parallel WaveNet ‘student’ can efficiently learn
'student' cooperates by attempting to match the teacher’s probabilities
to minimise the KL-divergence between its distribution and that of the teacher by maximising the log-likelihood of its samples under the teacher and maximising its own entropy at the same time
introduce 3 loss terms, power loss, perceptual loss, contrastive loss