A BUG of computing F1
h-peng17 opened this issue · 2 comments
h-peng17 commented
Thanks for the code. But I think there are some bugs when computing F1. In your code, the predicted list, take argument extraction for example, is [(type1, role1, argument1), ...]. However, it does not consider instance_id
and different instances may share the same (type1, role1, argument1), which causes more true predictions
. This bug will make the final evaluation metrics higher than normal. Or maybe I misunderstand your code. Wish for your reply.
luyaojie commented
Hi, Thanks for your attention.
The evaluation code counts true predictions
instance by instance, so it is no need to consider instance_id
.
h-peng17 commented
I got it. Thank you!