Averaging of F1 and EM scores
okhat opened this issue · 1 comments
okhat commented
Thanks for the awesome resource!
I was wondering how the F1 and EM scores are averaged. Are they simply the average value across all individual turns in the test set, or are they averaged per conversation first, and then averaged across conversations?
tuzhucheng commented
Hi @okhat, apologies about the delayed response.
They are averaged crossed all individual turns in the test set.