lucidrains/st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

PythonMIT

Issues

Top-n gating bug
#13 opened 5 months ago by lihenglin
1
Question on the experts' input
#12 opened 7 months ago by mrqorib
0
Question: How should I use the MoE argument ``experts: Optional[Module] = None''
#11 opened 8 months ago by luizcoroo
1
Question on increasing batch size and sequence length
#9 opened 8 months ago by jambo6
1
does `SplitByRankFunction` actually do anything?
#8 opened 9 months ago by 152334H
3
differentiable top k
#7 opened 9 months ago by wangzizhao
0
Seeking Help on Loss Behavior
#6 opened 9 months ago by guanidine
0
Duplicate
#5 opened 10 months ago by David-Archer-us
1
import pad_dim_to
#4 opened 10 months ago by David-Archer-us
1
About gating_top_n
#3 opened a year ago by Heihaierr
5
Implementing LIMoE from Google zurich
#2 opened a year ago by prateeky2806
1