davidmrau/mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

PythonGPL-3.0

Issues

A question for changing input size of moe
#28 opened 9 months ago by jhxu003
3
For def _prob_in_top_k
#31 opened 6 months ago by Brankozz
0
MoE for transformers
#24 opened a year ago by elias-ramzi
1
Zero Grad of w_gate
#27 opened 9 months ago by panmianzhi
10
requires_grad = True not required for a variable under combine() method?
#26 opened 9 months ago by doppiomovimento
0
have you ever meet such trend of loss ?
#22 opened 10 months ago by lonelyqian
4
How to use this layer in a sequence setting?
#25 opened 10 months ago by agupta54
0
some questions about the code
#23 opened a year ago by hanruisong00
0
Why is weighted sum calculated in the logarithmic space?
#21 opened a year ago by XieWeikai
1
Why not gpu？
#17 opened a year ago by chengjiaxiangbytedance
1
when torch version > 1.0.0, the SpatchExpert function get wrong result
#18 opened a year ago by rattlesnakey
4
cv_squared
#2 opened 5 years ago by caoshijie0501
1
why apply exp() log() in expert_out result in combine() function of SparseDispatcher class
#19 opened a year ago by Zrealshadow
1
regression task self.w_gate is nan
#16 opened 2 years ago by JieDengsc
4
Why logsoftmax in the expert's output?
#13 opened 2 years ago by sofiapvcp
5
multiple_by_gates after exp
#15 opened 2 years ago by yjw1029
2
Issue with gates parameters
#12 opened 3 years ago by elias-ramzi
5
Question about the noisy top-k gating
#11 opened 3 years ago by huangtinglin
3
about aux_loss
#10 opened 3 years ago by enterhuiche
2
Why there is prob_if_in/out in MoE-Loss-load?
#9 opened 3 years ago by Luowaterbi
4
Wrong Implementation in SparseDispatcher
#8 opened 3 years ago by Cascol-Chen
4
Examples of using real dataset
#3 opened 4 years ago by GabrielLin
4
Please add license file if open source
#6 opened 4 years ago by yellowlab9
2
Incompatibility with pytorch > pytorch 1.1
#5 opened 5 years ago by StillerPatrick
1
Log and Exp- Space
#4 opened 5 years ago by StillerPatrick
2
Tutorial of using Tensorflow version
#1 opened 5 years ago by GabrielLin
1