labmlai/annotated_deep_learning_paper_implementations

The problem with the description of the output in the code of Prepare for multi-head attention

JosieChen1214 opened this issue · 2 comments

Hi, sorry for bothering you. In the class PrepareForMultiHeadAttention(nn.Module), your description of the output is that the output has the shape [seq_len, batch_size, heads, d_k] or [batch_size, d_model]. But I think the output should be shaped as [seq_len, batch_size, heads, d_k] or [batch_size, heads, d_k] since x = x.view(*head_shape, self.heads, self.d_k). If my understanding is wrong, please correct me. Thank you very much!

pic_cap

vpj commented

yes you are right. will fix it. thank you

vpj commented

Fixed 3ec5fa9