Issues
- 1
Missing code for ODIN
#46 opened by maoliyuan - 0
Question regarding ARMO stage2-train code
#37 opened by RayWang-iat - 0
stage1-train:RuntimeError: torch.cat(): expected a non-empty list of Tensors
#36 opened by RayWang-iat - 1
Armo-rm env set-up and data processing
#35 opened by MaxwellJryao - 7
- 7
- 4
- 5
Code to reproduce ArmoRM
#28 opened by halfrot - 3
reproduce ArmoRM
#30 opened by richhh520 - 1
Clarification on Reward Usage in DPO Training
#33 opened by vincezh2000 - 2
preference dataset 404 not found
#29 opened by wty500 - 1
tutorial to reproduce ArmoRM
#17 opened by pluiez - 2
Regarding the Gemma2 Reward Model Structure
#26 opened by Loong435 - 3
"Token pattern not found in the list" error
#24 opened by nshen7 - 0
How to batch inference?
#25 opened by AIR-hl - 4
Code for Armo on Reward Bench
#15 opened by philschmid - 5
Training and evaluating for pair_pm model.
#21 opened by t-sifanwu - 1
- 6
question of chat templates
#16 opened by trueRosun - 1
environment set up issue
#18 opened by WayXG - 1
How do you implement SLic on pair_pm model?
#20 opened by t-sifanwu - 4
preference_700K dataset's details?
#19 opened by yechenzhi - 2
How to calculate the avg score of reward bench?
#14 opened by eyuansu62 - 1
Cannot run the training script
#2 opened by peter-peng-w - 1
how to serve this model?
#1 opened by jxgu1016 - 4
- 2
- 2
Low Safety Score for RM-Gemma-2B Model
#13 opened by loss4Wang - 2
can we say PM is better than BT?
#12 opened by yechenzhi - 1
quesion about the output
#11 opened by yechenzhi - 1
- 2
KeyError: 'input_ids_j' in training
#6 opened by iseesaw