/moe_attention

Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

Primary LanguagePythonMIT LicenseMIT

No issues in this repository yet.