The problem with the description of the output in the code of Prepare for multi-head attention
JosieChen1214 opened this issue · 2 comments
JosieChen1214 commented
Hi, sorry for bothering you. In the class PrepareForMultiHeadAttention(nn.Module), your description of the output is that the output has the shape [seq_len, batch_size, heads, d_k] or [batch_size, d_model]. But I think the output should be shaped as [seq_len, batch_size, heads, d_k] or [batch_size, heads, d_k] since x = x.view(*head_shape, self.heads, self.d_k). If my understanding is wrong, please correct me. Thank you very much!
vpj commented
yes you are right. will fix it. thank you