/Parameter-Efficient-MoE

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Primary LanguagePythonApache License 2.0Apache-2.0

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

News

Introduction

We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This appraoch perfrom instruction tuning and utilize MoE structure in an efficient way.

Parameter-Efficient Sparsity Crafting utilizes parameter efficient techiniques including QLoRA and Adapter to perfrom Efficient Sparse Upcycling.

The repo supports the training of dense models (LLaMA 2, Yi, Qwen1.5, etc.).

Model Lists

Camelidae Series Download
Camelidae-8x7B 🤗 HuggingFace
Camelidae-8x13B 🤗 HuggingFace
Camelidae-8x34B 🤗 HuggingFace
Camelidae-8x34B-pro 🤗 Coming Soon
Qwen2idae Series Download
Qwen2idae-16x14B-v1.0 🤗 HuggingFace
Qwen2idae-16x7B-v1.0 🤗 Coming Soon
Qwen2idae-16x1.8B-v1.0 🤗 Coming Soon

Performance

Model Activated Params MMLU (5shot) GSM8k (5shot) MATH (4shot) HumanEval (0shot) MBPP (4shot) HellaSwag (10shot)
GPT3.5 - 70.0% 57.1% 34.1% 48.1% - 85.5%
LLaMA2-70B-chat 70B 63.8% 59.3% 10.4% 32.3% 35.6% 84.8%
Camelidae-8x34B-pro 35B 75.7% 79.4% 24.0% 48.8% 43.2% 85.2%
Camelidae-8x34B 35B 75.6% 78.3% 22.6% 43.9% 41.4% 85.3%
SUSChat-34B 34B 76.4% 72.3% 22.0% 11.6% 40.2% 83.9%
Yi-34B-chat 34B 74.8% 67.6% 17.3% 20.1% 41.0% 83.9%
Qwen2idae-16x14B-v1.0 15B 66.7% 77.8% 29.9% 62.8% 48.6% 82.3%
Mixtral-8x7B-instruct 14B 68.7% 71.7% 22.1% 25.6% 40.6% 86.5%
Camelidae-8x13B 13B 54.4% 52.6% 9.8% 30.6% 30.4% 82.5%
LLaMA2-13B-chat 13B 53.9% 37.1% 5.2% 18.9% 27.2% 81.9%
Camelidae-8x7B 7B 48.3% 44.0% 5.8% 18.3% 23.4% 79.2%
LLaMA2-7B-chat 7B 47.2% 26.3% 3.9% 12.2% 17.6% 78.6%

We bold the top3 scores separately for all models.

Usage

Camelidae

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hywu/Camelidae-8x34B", trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained("hywu/Camelidae-8x34B", device_map="auto", trust_remote_code=True).eval()

inputs = tokenizer('### Human:\nHow are you?\n### Assistant:\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Qwen2idae

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", device_map="auto", trust_remote_code=True).eval()

inputs = tokenizer('<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Citation

@article{wu2024parameter,
  title={Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks},
  author={Wu, Haoyuan and Zheng, Haisheng and He, Zhuolun and Yu, Bei},
  journal={arXiv preprint arXiv:2401.02731},
  year={2024}
}

License

The source code in this repo is licensed under the Apache 2.0 License. Camelidae and Qwen2idae models are developed for academic research and free commercial use, all usage must adhere to the license from facebookresearch, 01-ai and Qwen1.5.