Question about the SeqSelfAttention.
katekats opened this issue · 0 comments
katekats commented
My question is: For the additive self-attention approach, are word embeddings from other timestamps taken into consideration for calculating the attention weights or only from the current timestamp (meaning word embeddings of the current sentence/input)?