XueFuzhao/awesome-mixture-of-experts

Dynamic Mixture of Experts

QAQdev opened this issue · 2 comments

We propose Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models, a routing mechanism which allows for a variable number of experts per token as well as a procedure for dynamically changing the number of experts during training.

I think this may match your interest on adaptive computation! Please kindly consider including this paper.

We also provide our implementation at LINs-lab/DynMoE.

Sure! Submitted a PR.