/MHMoE

Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch

Primary LanguagePythonMIT LicenseMIT

Watchers