lucidrains/Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
PythonMIT
Issues
- 1
Potential incorrect residual
#2 opened by VHellendoorn - 0
Squared ReLU and Laplace functions
#1 opened by buttercutter