OpenMoE is a project aimed at igniting the open-source MoE community! We are releasing a family of open-sourced Mixture-of-Experts (MoE) Large Language Models.
Since we are a small team working on a huge project, we cannot handle everything. Instead, we release some intermediate checkpoints in this repo to invite more contributors to work on open-sourced MoE project together!
[2023/08] 🔥 We released an intermediate OpenMoE-8B checkpoint (OpenMoE-v0.2) along with two other models. Check out the blog post.
- PyTorch Implementation with Colossal AI
- More Evaluation
- Continue Training to 1T tokens
- Paper
Currently, three models are released in total.
Model Name | Description | #Param | GCS | Huggingface | Gin File |
---|---|---|---|---|---|
OpenMoE-base/16E | A small MoE model for debugging | 637M | gs://openmoe/openmoe-base/checkpoint_500000 | Link | Link |
OpenLLaMA-base | A dense counter-part of OpenMoE-base | 310M | gs://openmoe/openllama-base/checkpoint_500000 | Link | Link |
OpenMoE-8B/32E | 8B MoE with comparable FLOPs of a 2B LLaMA | 8B | gs://openmoe/openmoe-8b/checkpoint_100000 | Link | Link |
We release all these checkpoints on Huggingface and Google Cloud Storage. For instance, you can download openmoe-8B with
gsutil cp -r gs://openmoe/openmoe-8b/checkpoint_100000 $YOUR_DIR
The base models are trained with 128B tokens. The openmoe-8B checkpoint with 4 MoE layers and 32 experts has been trained by 200B tokens. We are still training OpenMoE-8B. So if you are interested in the latest checkpoint, please feel free to drop Fuzhao an email (f.xue@u.nus.edu). In addition, we are highly interested in training this model until saturate by performing multi-epoch training, which means we may train our model for over 2T and even more tokens (this depends on the resource we can get in the coming months)
Note: downloading data from Google Cloud Storage is not free, but you can sign in to Google Cloud and get some credits.
Get a TPU-vm and run the following code on all TPUs. Researcher can apply TPU Research Cloud to get the TPU resource.
We are working on the PyTorch + GPU implementation with Colossal AI.
git clone https://github.com/XueFuzhao/OpenMoE.git
bash OpenMoE/script/run_pretrain.sh
Get a TPU-vm and run the following code on all TPUs.
git clone https://github.com/XueFuzhao/OpenMoE.git
bash OpenMoE/script/run_eval.sh
50% The RedPajama + 50% The Stack Dedup. We use a high ratio of coding data to improve reasoning ability.
We use the umt5 Tokenizer to support multi-lingual continue learning in the future, which can be downloaded on Huggingface or Google Cloud.
OpenMoE is based on ST-MoE but uses Decoder-only architecture. The detailed implementation can be found in Fuzhao's T5x and Flaxformer repo.
We use a modified UL2 training objective but Casual Attention Mask (We use more prefix LM and high mask ratio because it saves computation.):
- 50% prefix LM
- 10% span len=3 mask ratio=0.15
- 10% span len=8 mask ratio=0.15
- 10% span len=3 mask ratio=0.5
- 10% span len=8 mask ratio=0.5
- 10% span len=64 mask ratio=0.5
RoPE, SwiGLU activation, 2K context length. We will release a more detailed report soon.
We evaluate our model on TrivalQA and BigBench-Lite as our first step. We plot the cost-effectiveness curve in the figure below.
Relative Cost is approximated by multiplying activated parameters and training tokens. The size of dots denotes the number of activated parameters for each token. The lightgray dot denotes the total parameters of MoE models.
For more detailed results, please see our Blog
Our code is under Apache 2.0 License.
Since the models are trained on The Redpajama and The Stack dataset, please check the license of these two datasets for your model usage.
This project is currently contributed by the following authors:
Please cite the repo if you use the model and code in this repo.
@misc{openmoe2023,
author = {Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou and Yang You},
title = {OpenMoE: Open Mixture-of-Experts Language Models},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/XueFuzhao/OpenMoE}},
}