Issues
- 1
finetune_cosmopedia.sh如何训练出来8B模型
#27 opened by RuipingWang1986 - 1
关于论文中通用能力榜单几乎没有下降,部分反而有提升
#31 opened by bestpredicts - 6
请教下训练的显存需求
#20 opened by denghj3 - 0
这个方法可以扩展到vit类的视觉encode上吗?
#33 opened by lucasjinreal - 0
请教大佬可以训练qewn2-7b吗
#32 opened by jqtian123 - 4
关于运行流程
#30 opened by GOOD-N-LCM - 4
关于零初始化和扩展层的位置
#28 opened by ouyanxi1125 - 1
训练到10B tokens 时loss就收敛了 无法下降
#29 opened by bestpredicts - 2
- 8
Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?
#25 opened by hzgdeerHo - 6
增量预训练的疑惑?
#13 opened by zhuxiaobin - 2
guide to run the code
#11 opened by Abolfazl-kr - 1
- 5
论文Table7请教
#1 opened by XiaoYee - 2
Training on arbitary data
#23 opened by HelloWorldLTY - 1
Pretrain code of Mistral-Pro-8B-v0.1
#22 opened by shawnricecake - 2
Do we need to freeze embedding layer and the lm_head as well during the Llama-pro style training ?
#21 opened by shamanez - 3
新增的transformer层是与上一层共享参数吗?
#16 opened by CharlinChen - 1
Comparison with PEFT
#19 opened by LaVieEnRose365 - 1
更大的模型需要更多的block吗?
#18 opened by PoseidomWong - 8
您好,请教一下post pretrain的问题
#10 opened by ray075hl - 2
llama factory的llama-pro是不是写得不对啊
#15 opened by HuXinjing - 8
Question regarding the difference between llama-pro and the regular llama.(关于llama-pro和普通llama之间的区别的疑问)
#9 opened by WUHU-G - 1
对比lora优势是什么
#14 opened by xiaozhu1106 - 2
- 5
我们如何针对扩展区块微调?
#3 opened by win10ogod - 1
How to load the new model weight
#8 opened by khalil-Hennara - 1
Should I freeze norm.weight?
#7 opened by metterian - 2
- 1
full code to continue pre-training
#6 opened by Abolfazl-kr - 8
Code for training llama pro?
#2 opened by yhyu13 - 2
Arxiv Data
#4 opened by ZhengTang1120