RLHFlow/RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

PythonApache-2.0

Issues

Missing code for ODIN
#46 opened a month ago by maoliyuan
1
Question regarding ARMO stage2-train code
#37 opened 2 months ago by RayWang-iat
0
stage1-train:RuntimeError: torch.cat(): expected a non-empty list of Tensors
#36 opened 3 months ago by RayWang-iat
0
Armo-rm env set-up and data processing
#35 opened 3 months ago by MaxwellJryao
1
ArmoRM-Llama3-8B-v0.1's tokenizer is different from Meta-Llama-3-8B-Instruct's
#32 opened 3 months ago by efsotr
7
Can I inquire about some training details about armo-rm？
#27 opened 3 months ago by xiaotian917
7
How to finetune ARMO model with custom dataset?
#23 opened 3 months ago by Helen-Cheung
4
Code to reproduce ArmoRM
#28 opened 3 months ago by halfrot
5
reproduce ArmoRM
#30 opened 3 months ago by richhh520
3
Clarification on Reward Usage in DPO Training
#33 opened 3 months ago by vincezh2000
1
preference dataset 404 not found
#29 opened 4 months ago by wty500
2
tutorial to reproduce ArmoRM
#17 opened 6 months ago by pluiez
1
Regarding the Gemma2 Reward Model Structure
#26 opened 5 months ago by Loong435
2
"Token pattern not found in the list" error
#24 opened 5 months ago by nshen7
3
How to batch inference?
#25 opened 5 months ago by AIR-hl
0
Code for Armo on Reward Bench
#15 opened 5 months ago by philschmid
4
Training and evaluating for pair_pm model.
#21 opened 6 months ago by t-sifanwu
5
Bradley-Terry model removes lm head while saving
#22 opened 6 months ago by Arnav0400
1
question of chat templates
#16 opened 6 months ago by trueRosun
6
environment set up issue
#18 opened 6 months ago by WayXG
1
How do you implement SLic on pair_pm model?
#20 opened 6 months ago by t-sifanwu
1
preference_700K dataset's details?
#19 opened 6 months ago by yechenzhi
4
How to calculate the avg score of reward bench?
#14 opened 7 months ago by eyuansu62
2
Cannot run the training script
#2 opened 7 months ago by peter-peng-w
1
how to serve this model?
#1 opened 7 months ago by jxgu1016
1
Cannot understant the code at README.md of pair-pm
#8 opened 7 months ago by heyzude
4
Does pair-pm supports multi-turn conversation?
#9 opened 7 months ago by heyzude
2
Low Safety Score for RM-Gemma-2B Model
#13 opened 7 months ago by loss4Wang
2
can we say PM is better than BT?
#12 opened 7 months ago by yechenzhi
2
quesion about the output
#11 opened 7 months ago by yechenzhi
1
How to construct new pairs for adding to the dataset
#10 opened 7 months ago by wlhgtc
1
KeyError: 'input_ids_j' in training
#6 opened 8 months ago by iseesaw
2