Negative reward when serving ArmoRM-Llama3-8B-v0.1
maoliyuan opened this issue · 4 comments
Hello! When I serve ArmoRM-Llama3-8B-v0.1 using OpenRLHF, the output rewards are almost negative (around -2.0). I've attached some pictures of how I served the reward model. Is the output of this RM naturally around -2.0, or is it because the way I serve the RM is wrong? (The prompt dataset are also from rlhflow, like "RLHFlow/iterative-prompt-v1-iter7-20K", and the responses are generated from "RLHFlow/LLaMA3-iterative-DPO-final". We also apply the chat template when creating the prompt-response dataset
)
Could you try the service example in https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1
Perhaps I got the reason... I built the model from AutoModel.from_pretrained rather than AutoModelForSequenceClassification.from_pretrained, and when I tried the example you gave, the model will output something like this:
It's a correct output and has everything I want. However, when I built from AutoModel.from_pretrained, the model output will become something like this:
Could you please explain the reason behind this? Thanks a lot.
You may want to check the document of huggingface about the difference between AutoModel and AutoModel+specified model type.
Thanks a lot! By the way, could you please provide an example that inferences for a batch of input and takes attention mask as an input? The example that you provided in HuggingFace only contains inference for a single input.