jinglescode/papers

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

jinglescode opened this issue 5 years ago · 0 comments

jinglescode commented 5 years ago

Paper

Link: https://arxiv.org/pdf/1910.10288.pdf
Year: 2019

Summary

using simple location-relative attention mechanisms to do away with content-based query/key comparisons, to handle out-of-domain text
introduce a new location-relative attention mechanism to the additive energy-based family, called Dynamic Convolution Attention (DCA)

Results

Dynamic Convolution Attention and V2 GMM attention with initial bias (GMMv2b) are able to generalize to utterances much longer than those seen during training, while preserving naturalness on shorter utterances
improved speed and consistency of alignment during training
advantage of DCA over GMM attention is:

DCA can more easily bound its receptive field, which makes it easier to incorporate
hard windowing optimizations in production
its attention weights are normalized, which helps to stabilize the alignment, especially for coarsegrained alignment tasks

Comments