Issues
- 0
- 5
关于System Prompt
#28 opened by DirtyKnightForVi - 1
- 1
贵团队是否会升级长上下文的版本?
#47 opened by edisonzf2020 - 1
- 1
关于模型指标有一些疑问
#45 opened by MangoFF - 4
TriviaQA结果复现求助
#33 opened by HYZ17 - 1
Deepseek VL?
#44 opened by IdiotSandwichTheThird - 0
Could you please release intermediate pretraining checkpoints at HuggingFace?
#43 opened by Yangjinluan - 1
Deepseek SFT数据包含system应该如何处理?
#41 opened by xiatingyu - 1
Scaling laws data
#42 opened by borgr - 1
请问LLM和coder的base model结构是一样的吗?还是有什么区别呢?
#40 opened by cherishtttz - 1
AWS CLI 使用问题与 deepseek-ai S3 桶访问问题
#34 opened by go-with-me000 - 2
can you please share sharded (<2gb / bin) model?
#2 opened by amrrs - 3
Programming Language in LeetCode Weekly Contest
#24 opened by ShaneTian - 1
关于vllm使用的疑问
#37 opened by xuyifan-0731 - 1
Training data distribution
#36 opened by pluiez - 2
AlignBench测评结果复现求助
#32 opened by FoolMark - 1
- 1
Will finetune scripts be provided?
#23 opened by ftgreat - 1
Missing files in released pretrain ckpts
#26 opened by Wizardcoast - 4
- 0
- 0
lora sft deepseek 67b base版本
#20 opened by liwenju0 - 1
- 4
DeepSeek 7B Chat Lora 效果太棒了!
#12 opened by KMnO4-zx - 1
- 1
- 1
- 1
LeetCode Weekly Contest Data
#8 opened by tonysy - 0
图很好
#7 opened by tpoisonooo - 1
Learning rate schedule seems very helpful.
#1 opened by GanjinZero - 1
About LR schedule
#3 opened by futuristx