Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch
Primary LanguagePythonMIT LicenseMIT