The problem with the description of the output in the code of Prepare for multi-head attention

Question

The problem with the description of the output in the code of Prepare for multi-head attention

JosieChen1214 opened this issue a year ago · 2 comments

Hi, sorry for bothering you. In the class PrepareForMultiHeadAttention(nn.Module), your description of the output is that the output has the shape [seq_len, batch_size, heads, d_k] or [batch_size, d_model]. But I think the output should be shaped as [seq_len, batch_size, heads, d_k] or [batch_size, heads, d_k] since x = x.view(*head_shape, self.heads, self.d_k). If my understanding is wrong, please correct me. Thank you very much!

Answer 1 · 2022-12-18T15:19:00.000Z

yes you are right. will fix it. thank you

Answer 2 · 2022-12-24T12:54:36.000Z

Fixed 3ec5fa9