ZhenweiAn/Dynamic_MoE
Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"
Python
Issues
- 0
Where is the code for loss_b and loss_d as described in Equation (5) and (6) of your paper?
#3 opened by XL2248 - 0
您好训练部分的代码可以上传一下吗
#2 opened by puppy2000 - 0
Why num_experts=-1
#1 opened by jiangsongtao