Mutlihead attention implementation
Opened this issue · 1 comments
hash2430 commented
# Concatenate context vector with input (most important)
result = t.cat([decoder_input, result], dim=-1)
Excuse me. I don't think I have seen concatenating multiheads with original input when doing self-attention.
Plus you commented it as important. I guess I am missing some thing?
Do you mind if I ask which paper you referred to when implementing this part of multihead attention?
Wallart commented
Same question here.
I didn't see any reference in the transformer TTS paper
https://arxiv.org/abs/1809.08895
EDIT : It might be link to "The multi-head attention can integrate the encoder hidden states in multi- ple perspectives and generate better context vectors"
In section 3.6 of the paper. Not sure of my interpretation