Issues
- 0
第二篇论文中奖励模型训练的问题
#58 opened by Syaoran1 - 12
关于rm模型训练策略与损失函数
#43 opened by tonylin52 - 0
- 0
关于 root square of kl divs 与 rewards 的线性关系
#56 opened by shirosheep000 - 1
关于rm中lm loss计算的疑问
#48 opened by DZ9 - 1
- 1
- 1
请问下代码里的kl散度问题
#35 opened by rigorosyangffff - 5
关于ppo阶段,reward分数计算的问题
#26 opened by mengyanggithub - 1
- 0
对第二篇论文中有些不明白的地方请教解惑
#53 opened by Obr00007576 - 4
论文中rm对比学习训练方法疑问
#45 opened by yhhh777 - 1
Part2中meta dataset的生成
#51 opened by yata0 - 1
- 1
PPOSFTDataset bug report和相关问题咨询
#49 opened by DZ9 - 4
关于中文reward-model参数合并的问题
#24 opened by hannlp - 4
bash train_ppo_en.sh error
#46 opened by robotzheng - 2
Issues with using the released hh dataset.
#44 opened by jltchiu - 2
- 1
请问目前支持基座模型使用Mistral-7b吗
#39 opened by YijuGuo - 3
[Question] Adaptive Margin
#40 opened by eyuansu62 - 1
自有的底座模型,自有的SFT权重,重新训练RM,可行么
#38 opened by camposs1979 - 13
关于reward model的部分的part 2有计划时间节点吗
#31 opened by SpongebBob - 1
Inference with SFT and Policy EN models
#36 opened by henrypapadatos - 1
- 3
训练reward model的脚本
#16 opened by wangzhao88 - 6
- 1
关于reward model的权重合并问题
#33 opened by HuipengXu - 3
- 2
- 3
Technical report PART 2
#13 opened by snowkcon - 5
关于reward model
#10 opened by skepsun - 2
Any benchmark vs SFT?
#30 opened by guotong1988 - 1
Training on 8 Nvidia RTX A6000
#19 opened by Top34051 - 1
PPO data en
#27 opened by borisshapa - 1
deepspeed的parameter_offload问题
#29 opened by LiangZhuuu - 0
PPO显存占用问题
#28 opened by LiangZhuuu - 1
typo
#25 opened by chosenone75 - 12
关于Reward model打分的一些疑惑
#21 opened by hannlp - 1
reward model训练的哪些方面的能力
#22 opened by yuanhuachao - 1
英文的PPOdata
#20 opened by QYHcrossover - 2
内存占用大问题
#12 opened by QYHcrossover - 2
value model与reward model
#18 opened by KUANWB - 5
PPO训练稳定性问题
#17 opened by hust-kevin - 2
Reward Model
#11 opened by Cyber-Axe - 2
Training script of reward model
#14 opened by zwhe99 - 1
reward_model准确率
#15 opened by mingrenbuke - 1
support lora training
#9 opened by akk-123 - 2
用于PPO训练的数据结构
#7 opened by Arain-sh - 4
Can I run this pipeline on A100-40GB?
#8 opened by zwhe99