allenai/FineGrainedRLHF

PythonApache-2.0

Issues

concerns about the length of the input
#14 opened 5 months ago by andyclsr
1
Running PPO with a subset of RMs
#13 opened 6 months ago by vishwa27yvs
2
计算advantages时lastgaelam是指什么？
#8 opened 6 months ago by Congcong-Song
1
Issue when running train_sft.sh
#12 opened 9 months ago by yunsaijc
1
运行了源码，没有修改过模型和参数，但是loss和reward的结果很震荡
#9 opened a year ago by Congcong-Song
3
Weights of reward functions in RLHF
#10 opened a year ago by jsw7460
1
支持多卡并行吗？训练ppo的时候似乎所有的模型都加载在同一块卡上了
#7 opened a year ago by Congcong-Song
1
训练好的modeling_output可以提供一下吗？例如偏好模型，奖励模型？
#6 opened a year ago by Congcong-Song
2
sft训练时找不到transformers.generation
#5 opened a year ago by Congcong-Song
1
Is there any plan to share the pre-trained rewards (R1, R2, R3 and R_pref) on HuggingFace?
#4 opened a year ago by ZHZisZZ
0
training files missing for training finegrained reward models
#3 opened a year ago by wise-east
2
Open-sourcing the reward models
#1 opened a year ago by Glavin001
2
rename requrements.txt
#2 opened a year ago by nishkalavallabhi
1