lucidrains/st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
PythonMIT
Issues
- 1
Top-n gating bug
#13 opened by lihenglin - 0
Question on the experts' input
#12 opened by mrqorib - 1
Question: How should I use the MoE argument ``experts: Optional[Module] = None''
#11 opened by luizcoroo - 1
- 3
- 0
differentiable top k
#7 opened by wangzizhao - 0
Seeking Help on Loss Behavior
#6 opened by guanidine - 1
Duplicate
#5 opened by David-Archer-us - 1
import pad_dim_to
#4 opened by David-Archer-us - 5
About gating_top_n
#3 opened by Heihaierr - 1
Implementing LIMoE from Google zurich
#2 opened by prateeky2806