twistedcubic/attention-rank-collapse
[ICML 2021 Oral] We show pure attention suffers rank collapse, and how different mechanisms combat it.
Jupyter NotebookApache-2.0
Stargazers
- aashiqmuhamedCarnegie Mellon University
- bjyanlibeijing
- cfoster0
- chaoyue729DX-inc
- chengchingwen
- chuanmingliuWesteros
- Cuda-ChenSeeking for opportunities
- digantamisra98@mila-iqia @landskape-ai
- fly51flyPRIS
- freddy5566University of Washington
- goodloopShanghai
- gyq716
- imvladikonIsrael
- JorgeCejaStealth
- juseraru
- l294265421Tencent
- LawGoswellPeking University
- loukasaEPFL
- lyqunHKUST
- MancheryTsinghua University
- mcbalBelgium
- odellus@phytomech
- owainwestThinknum Alternative Data
- RichardChangCAUniversity of Ottawa
- rish-16@KrishnaswamyLab @caraml-group
- romankop
- ShawnKSThe Chinese University of Hong Kong, Shenzhen
- stjordanisGreece
- theblackcat102iKala
- toshasETH Zurich
- usharengAI Freelancer
- ValarChenZJU
- vivym
- wufeimJohns Hopkins University
- xxtars
- ZeyuFuUniversity of Exeter