/muMoE

[NeurIPS'24] Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization

Primary LanguagePython

Watchers