/DynMoE

[Preprint] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Primary LanguagePythonApache License 2.0Apache-2.0

Watchers