karpathy/llama2.c

Could anyone port deepseek-moe to llama2.c?

win10ogod opened this issue · 0 comments

It employs an innovative MoE architecture, which involves two principal strategies: fine-grained expert segmentation and shared experts isolation.
https://github.com/deepseek-ai/DeepSeek-MoE/tree/main
https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat