Issues
- 5
Early loss divergence for upcycling
#15 opened by yazdayy - 3
Dropout Regularization in expert modules
#14 opened by taehyunzzz - 1
Config Naming
#16 opened by mchorton - 1
Supported generative tasks
#13 opened by taehyunzzz - 1
recommended conf
#12 opened by raingart - 1
How to get the MMLU results in Table 4?
#11 opened by mathfinder - 13
Implementing MoE Sparse Upcycling
#9 opened by adumans - 7
llama.cpp / GGUF support
#7 opened by sammcj - 1
Tokenized dataset?
#10 opened by joelburget - 5
MOE Export Parallelism Training Script
#8 opened by wdlctc