A BUG of computing F1

Question

A BUG of computing F1

h-peng17 opened this issue 3 years ago · 2 comments

Thanks for the code. But I think there are some bugs when computing F1. In your code, the predicted list, take argument extraction for example, is [(type1, role1, argument1), ...]. However, it does not consider instance_id and different instances may share the same (type1, role1, argument1), which causes more true predictions. This bug will make the final evaluation metrics higher than normal. Or maybe I misunderstand your code. Wish for your reply.

Answer 1 · 2022-07-12T13:16:31.000Z

Hi, Thanks for your attention.

The evaluation code counts true predictions instance by instance, so it is no need to consider instance_id .

Answer 2 · 2022-07-12T14:28:32.000Z

I got it. Thank you!