Dynamic Mixture of Experts
QAQdev opened this issue · 2 comments
QAQdev commented
We propose Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models, a routing mechanism which allows for a variable number of experts per token as well as a procedure for dynamically changing the number of experts during training.
I think this may match your interest on adaptive computation! Please kindly consider including this paper.
We also provide our implementation at LINs-lab/DynMoE.
XueFuzhao commented
Hi,
Sounds cool!
Could you please submit a PR to add your paper?
Best,
Fuzhao
Zhenglin Cheng ***@***.***>于2024年6月25日 周二22:33写道:
… We propose *Dynamic Mixture of Experts: An Auto-Tuning Approach for
Efficient Transformer Models* <https://arxiv.org/abs/2405.14297>, a
routing mechanism which allows for a variable number of experts per token
as well as a procedure for dynamically changing the number of experts
during training.
I think this may match your interest on adaptive computation! Please
kindly consider including this paper.
We also provide our implementation at LINs-lab/DynMoE
<https://github.com/LINs-lab/DynMoE>.
—
Reply to this email directly, view it on GitHub
<#9>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANUEIJSL5AN5LJJBWZTXU43ZJF5UZAVCNFSM6AAAAABJ37VBCOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TEOBWGUZTOOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
QAQdev commented
Sure! Submitted a PR.