MultiHeadAttention parameter setting

Question

LXXiaogege opened this issue a year ago · 2 comments

Is the output linear layer parameter of the MultiHeadAttention class incorrectly set in mha.py file? in_features should be heads*d_k?

Answer 1 · 2023-05-02T14:15:54.000Z

The get_positional_encoding method of position encoder generates an error when d_model is set to odd

Answer 2 · 2023-06-30T10:13:25.000Z

Our implementation assumes that heads * d_k = d_model. Need to change that