Superklez opened this issue 4 years ago · 0 comments
Is the input shape of MultiHeadAttention [batch_size, sequence_length, embedding_size]? Or is it the same as nn.MultiheadAttention where the input shape must be [sequence_length, batch_size, embedding_size]
MultiHeadAttention
nn.MultiheadAttention