astramind-ai/Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
Python
Stargazers
- 0xHenrikssonGilroy, CA
- 90r
- AArchLichKing
- BOOLXXY
- cjxcnPeking University
- Dcas89
- deng451e
- di-oscbeijing
- DingpxZhejiang University
- djoffrey
- eric8810STAST
- footsome
- GeneZC
- h-zhao1997Westlake University
- JeffCarpenterCanada
- JingyangDeng
- luciencho
- lymiao888china
- mazeyangTsinghua University
- NizarIslahMontreal, Canada
- PhoebusSiUCAS
- SbrNight
- scissorstailSeoul, South Korea
- shyamsn97
- skykisekiChina
- SmartTany
- tanghui315
- TGLTommy
- torrid-fishTaiwan
- ultranity
- xiangbogaobarbarry
- xinyuliu-jeffreyCUHK
- Yang-Yan-Yang-Yan
- ZhuochengZhang98University of Chinese Academy of Sciences
- ZiangWu-77
- zqgongShenzhen