proger/moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
PythonMIT
No issues in this repository yet.
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
PythonMIT
No issues in this repository yet.