google-research-datasets/natural-questions

clarification of 'score'

filbertphang opened this issue · 1 comments

Hello, in 'nq_eval.py' it is mentioned that "Each prediction should be provided with a long answer score, and a short answers score".

May I clarify what these scores refer to? Are these scores supposed to represent the confidence of the model's predictions, or is there a fixed method to obtain scores?

For example, can we define the 'score' to simply be the sum of the start and end logits of the prediction?

Lastly, are scores also required for null predictions?

Thank you very much!

Did you figure out how to implement this? I'm stuck on the same issue.