openai/prm800k

Question about reward model evaluation metric

Opened this issue · 0 comments

Thanks for this great work! I have one question about how you measure the performance of the reward model. You mentioned in section 2.1 that 'We evaluate a reward model by its ability to perform best-of-N search over uniformly sampled solutions from the generator'. I am curious about, why not directly calculate the reward model accuracy over the test set and use that as the metric?