jinglescode/papers

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

jinglescode opened this issue · 0 comments

Paper

Link: https://arxiv.org/pdf/2011.03568.pdf
Year: 2020

Summary

image

Contributions and Distinctions from Previous Works

  • text encoding is passed to a block-autoregressive decoder using attention, producing conditioning features
  • use location-sensitive attention, which was more stable than the non-content-based GMM attention
  • replace ReLU activations with tanh in the pre-net
  • add a skip connection over the decoder pre-net and attention layers, to give the flow direct access to the samples directly preceding the current frame