RLHFlow/Online-RLHF

Negative reward when serving ArmoRM-Llama3-8B-v0.1

maoliyuan opened this issue · 4 comments

Hello! When I serve ArmoRM-Llama3-8B-v0.1 using OpenRLHF, the output rewards are almost negative (around -2.0). I've attached some pictures of how I served the reward model. Is the output of this RM naturally around -2.0, or is it because the way I serve the RM is wrong? (The prompt dataset are also from rlhflow, like "RLHFlow/iterative-prompt-v1-iter7-20K", and the responses are generated from "RLHFlow/LLaMA3-iterative-DPO-final". We also apply the chat template when creating the prompt-response dataset
serve-armo-reward-model2
serve-armo-reward-model
)
serve-armo-reward-model1

Perhaps I got the reason... I built the model from AutoModel.from_pretrained rather than AutoModelForSequenceClassification.from_pretrained, and when I tried the example you gave, the model will output something like this:
armo-rm-custom-output
It's a correct output and has everything I want. However, when I built from AutoModel.from_pretrained, the model output will become something like this:
armo-rm-auto-output
Could you please explain the reason behind this? Thanks a lot.

You may want to check the document of huggingface about the difference between AutoModel and AutoModel+specified model type.

Thanks a lot! By the way, could you please provide an example that inferences for a batch of input and takes attention mask as an input? The example that you provided in HuggingFace only contains inference for a single input.