/MomentumSMoE

Implementation for MomentumSMoE

Primary LanguagePython

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

https://arxiv.org/abs/2410.14574

Prerequisites

Usage

Prepare WikiText-103 Datasets:

  • Download the WikiText-103 dataset from here, then change bash scripts based on your local data paths.
data_directory/
    └── wikitext-103
        ├── test.txt
        ├── train.txt
        └── valid.txt

Pretraining SMoE (SwitchTransformers) on WikiText-103:

bash scripts/smoe-s.sh
bash scripts/smoe-m.sh
bash scripts/smoe-l.sh

Pretraining MomentumSMoE on WikiText-103:

bash scripts/smoe-mom-s.sh
bash scripts/smoe-mom-m.sh
bash scripts/smoe-mom-l.sh

Pretraining AdamSMoE on WikiText-103:

bash scripts/smoe-adam-m.sh

Pretraining GLaM on WikiText-103:

bash scripts/glam-m.sh

Pretraining MomentumGLaM on WikiText-103:

bash scripts/glam-mom-m.sh

Pretraining AdamGLaM on WikiText-103:

bash scripts/glam-adam-m.sh

Wandb support:

  • Add these flags to bash script with your project and job name
--wandb 
--project-name test 
--job-name test