Use naive MultiheadAttention implement to replace nn.MultiheadAttention in pytorch
Primary LanguagePython