Dynamic Mixture of Experts

Question

QAQdev opened this issue 6 months ago · 2 comments

We propose Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models, a routing mechanism which allows for a variable number of experts per token as well as a procedure for dynamically changing the number of experts during training.

I think this may match your interest on adaptive computation! Please kindly consider including this paper.

We also provide our implementation at LINs-lab/DynMoE.

Answer 1 · 2024-06-25T14:35:18.000Z

Hi, Sounds cool! Could you please submit a PR to add your paper? Best, Fuzhao Zhenglin Cheng ***@***.***>于2024年6月25日周二22:33写道：

We propose *Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models* <https://arxiv.org/abs/2405.14297>, a routing mechanism which allows for a variable number of experts per token as well as a procedure for dynamically changing the number of experts during training. I think this may match your interest on adaptive computation! Please kindly consider including this paper. We also provide our implementation at LINs-lab/DynMoE <https://github.com/LINs-lab/DynMoE>. — Reply to this email directly, view it on GitHub <#9>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANUEIJSL5AN5LJJBWZTXU43ZJF5UZAVCNFSM6AAAAABJ37VBCOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TEOBWGUZTOOI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2024-06-25T14:46:51.000Z

Sure! Submitted a PR.