Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

Question

jinglescode opened this issue 5 years ago · 0 comments

Paper

text encoding is passed to a block-autoregressive decoder using attention, producing conditioning features
use location-sensitive attention, which was more stable than the non-content-based GMM attention
replace ReLU activations with tanh in the pre-net
add a skip connection over the decoder pre-net and attention layers, to give the flow direct access to the samples directly preceding the current frame