/rectified-linear-attention

Sparse Attention with Linear Units

Primary LanguagePythonMIT LicenseMIT

Rectified Linear Attention

This repo contain pytorch implementation of Sparse Attention with Linear Units, this is not the official repo so some details might be vary from paper.

Citation:

@misc{zhang2021sparse,
      title={Sparse Attention with Linear Units}, 
      author={Biao Zhang and Ivan Titov and Rico Sennrich},
      year={2021},
      eprint={2104.07012},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

References:

  • Transformer component and initial Attention code from lucidrain's vit-pytorch
  • RMSNorm code is from this repo.