MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
https://arxiv.org/abs/2410.14574
- pytorch
- fastmoe: https://github.com/laekov/fastmoe
- The toolkit supports Weights & Biases for monitoring jobs. If you use it, also install
wandb
.
- Download the WikiText-103 dataset from here, then change bash scripts based on your local data paths.
data_directory/
└── wikitext-103
├── test.txt
├── train.txt
└── valid.txt
bash scripts/smoe-s.sh
bash scripts/smoe-m.sh
bash scripts/smoe-l.sh
bash scripts/smoe-mom-s.sh
bash scripts/smoe-mom-m.sh
bash scripts/smoe-mom-l.sh
bash scripts/smoe-adam-m.sh
bash scripts/glam-m.sh
bash scripts/glam-mom-m.sh
bash scripts/glam-adam-m.sh
- Add these flags to bash script with your project and job name
--wandb
--project-name test
--job-name test