deepseek-ai/DeepSeek-MoE
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
PythonMIT
Issues
- 0
Ablation studies for DeepSeekMoE
#42 opened by Psycoy - 0
用来训练deepseek-v2-coder-lite-instruction会挂起
#41 opened by bao-xiaoyi - 6
您们会开源DeepSeekMoE 2B模型吗?
#16 opened by win10ogod - 0
训练MoE的时候会出现loss = 0 的情况
#39 opened by AlenjandroWang - 0
专家并行是怎么配置的? 有配置代码吗
#38 opened by ninglonglong - 3
- 3
Finetune with deepspeed: type mismatch
#35 opened by YeZiyi1998 - 0
why <|EOT|> ?
#37 opened by BING-LLL - 0
Close expert parallel in vllm
#36 opened by trebladev - 0
单卡A100-80G推理速度慢
#34 opened by Dreaming-world - 3
- 1
MOE 并行怎么实现的?
#31 opened by YunxinLi - 1
- 2
您好,能否提供量化的方案
#21 opened by edisonzf2020 - 0
No need to add epsilon 1e-20 in topk norm?
#33 opened by MARD1NO - 0
能添加modelscope链接吗,这样可以更方便一些不能连hg的情况
#32 opened by lll143653 - 3
您们好请问准备开源的moe-145b什么时候准备上传呢?
#25 opened by win10ogod - 4
finetune后的模型输出异常
#28 opened by JustQJ - 2
load erros
#24 opened by cooper12121 - 1
请问现在支持在NPU设备上进行微调吗
#26 opened by Tyx-main - 1
Can you provide the inference version of DeepSeek based on vllm, deepspeed and tensorrt-llm
#23 opened by Eutenacity - 1
How to fully finetune MoE on multiple nodes
#12 opened by ftgreat - 1
您们有计划支持llama.cpp这个项目吗
#15 opened by hqu-little-boy - 3
您们能够开源复现模型架构的训练项目吗?
#7 opened by win10ogod - 1
关于flash_attn
#20 opened by GXKIM - 1
非常棒的工作,有没有微信沟通群呢
#22 opened by dawson-chen - 1
Will it compare performance with llama-moe?
#11 opened by ccccj - 1
- 4
- 1
#feature request# DeepSeek-Moe for code
#8 opened by Xingxiangrui - 1
Question about AddAuxiliaryLoss?
#17 opened by KaiWU5 - 3
- 4
开源的MoE模型支持中文吗?
#6 opened by uloveqian2021 - 3
inference tools like vllm can support?
#2 opened by zhang001122 - 0
flash atten
#19 opened by GXKIM - 4
- 2
- 1
finetune 过程出错
#10 opened by ifromeast - 2