Rubics-Xuan/TransBTS

Multi-Head Attention

Closed this issue · 1 comments

It seems that Multi-Head Attention did not implement multi heads=8?

attn = (q @ k.transpose(-2, -1)) * self.scale
attn = attn.softmax(dim=-1)
attn = self.attn_drop(attn)
x = (attn @ v).transpose(1, 2).reshape(B, N, C)

Although there is no concatenation in the code which implements the multi-head, the codes above actually achieves the multi-head self-attention.