CyberZHG/keras-self-attention

Question about the SeqSelfAttention.

katekats opened this issue · 0 comments

My question is: For the additive self-attention approach, are word embeddings from other timestamps taken into consideration for calculating the attention weights or only from the current timestamp (meaning word embeddings of the current sentence/input)?