deepseek-ai/DeepSeek-MoE

Will it compare performance with llama-moe?

ccccj opened this issue · 1 comments

llama-moe:
https://github.com/pjlab-sys4nlp/llama-moe/tree/main

Or will a training framework be released with llama as the base model?

Our DeepSeekMoE model is trained from scratch, but not resumed or initialized from the LLaMA/LLaMA2 checkpoints.
As for absolute performance, you can refer to our and their papers for details.