[Preprint] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Primary LanguagePythonApache License 2.0Apache-2.0