Question about reward model evaluation metric
Opened this issue · 0 comments
waterhorse1 commented
Thanks for this great work! I have one question about how you measure the performance of the reward model. You mentioned in section 2.1 that 'We evaluate a reward model by its ability to perform best-of-N search over uniformly sampled solutions from the generator'. I am curious about, why not directly calculate the reward model accuracy over the test set and use that as the metric?