/do-you-even-need-attention

Is the attention layer even necessary? (https://arxiv.org/abs/2105.02723)

Primary LanguagePython

Watchers