MultiHeadAttention parameter setting
LXXiaogege opened this issue · 2 comments
LXXiaogege commented
Is the output linear layer parameter of the MultiHeadAttention class incorrectly set in mha.py file? in_features should be heads*d_k?
LXXiaogege commented
The get_positional_encoding method of position encoder generates an error when d_model is set to odd
vpj commented
Our implementation assumes that heads * d_k = d_model. Need to change that