EduardoNeville/myreads

My notes on the MoE module and the Mamba architecture

Readme
0Issues
0Stargazers
1Watcher

MoE-Mamba Papers and Notes

Find my notes in the markdown file

Links to the papers

MoE-Mamba

MoE (Mixture of Experts) Papers

Mixtral-of-Experts
ScalableandEfficientMoETrainingforMultitaskMultilingualModels
ST-MoE
UNIFIED-SCALING-LAWS-FOR-ROUTED-LANGUAGE-MODELS
Switch-Transformers
LimitsofTransferLearningwhUnified
EMPIRICAL-UNDERSTANDING-OF-MOE-DESIGN-CHOICES
OUTRAGEOUSLY-LARGE-NEURAL-NETWORKS
MoE-CROSS-EXAMPLE-AGGREGATION
Transferable-Adversarial-Robustness-for-Categorical-Data-via-Universal-Robust-Embedding
Hash-Layers-For-Large-Sparse-Models

Mamba

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
Combining-Recurrent-Convolutional-and-Continuous-time-Models-with-Linear-State-Space-Layers
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Vision-Mamba
RepeatAfterMe
Efficiently-ModLongSeqwhSSS
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
MambaByte
Legendre Memory Units
HiPPO Recurrent Memory

Share to

Contact site admin: Geeks.