Why do you use partial match max metric for QA
vkaul11 opened this issue · 1 comments
vkaul11 commented
Just wanted to know why we have https://github.com/hsiehjackson/RULER/blob/main/scripts/eval/synthetic/constants.py#L25
Why is this different from string_match_all for QA specifically ? Basically if any of the predictions match the reference, it is ok ? I didn't quite understand this well.
def string_match_part(preds, refs):
score = sum([max([1.0 if r.lower() in pred.lower() else 0.0 for r in ref]) for pred, ref in zip(preds, refs)]) / len(preds) * 100
return round(score, 2)
hsiehjackson commented
string_match_part
can get 100% score when matching one of the references; string_match_all
should match all of the references to get 100% score. The reason we use string_match_part
in QA tasks is because most of the references are paraphrase sentences. Matching one of the references for QA is enough.