apple/ml-qrecc

Averaging of F1 and EM scores

okhat opened this issue · 1 comments

okhat commented

Thanks for the awesome resource!

I was wondering how the F1 and EM scores are averaged. Are they simply the average value across all individual turns in the test set, or are they averaged per conversation first, and then averaged across conversations?

Hi @okhat, apologies about the delayed response.

They are averaged crossed all individual turns in the test set.