Negative reward when serving ArmoRM-Llama3-8B-v0.1

Question

Negative reward when serving ArmoRM-Llama3-8B-v0.1

maoliyuan opened this issue 5 months ago · 4 comments

Hello! When I serve ArmoRM-Llama3-8B-v0.1 using OpenRLHF, the output rewards are almost negative (around -2.0). I've attached some pictures of how I served the reward model. Is the output of this RM naturally around -2.0, or is it because the way I serve the RM is wrong? (The prompt dataset are also from rlhflow, like "RLHFlow/iterative-prompt-v1-iter7-20K", and the responses are generated from "RLHFlow/LLaMA3-iterative-DPO-final". We also apply the chat template when creating the prompt-response dataset

)

Answer 1 · 2024-08-30T16:18:28.000Z

Could you try the service example in https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1

Answer 2 · 2024-09-02T09:57:27.000Z

Perhaps I got the reason... I built the model from AutoModel.from_pretrained rather than AutoModelForSequenceClassification.from_pretrained, and when I tried the example you gave, the model will output something like this:

It's a correct output and has everything I want. However, when I built from AutoModel.from_pretrained, the model output will become something like this:

Could you please explain the reason behind this? Thanks a lot.

Answer 3 · 2024-09-02T19:32:19.000Z

You may want to check the document of huggingface about the difference between AutoModel and AutoModel+specified model type.

Answer 4 · 2024-09-03T06:35:44.000Z

Thanks a lot! By the way, could you please provide an example that inferences for a batch of input and takes attention mask as an input? The example that you provided in HuggingFace only contains inference for a single input.