Bug in Data Splitting on FewRel Dataset

Question

SaeedNajafi opened this issue 2 years ago · 2 comments

Hey,
The fewrel dataset has 700 sentences per relation id.

After splitting the FewRel into train/dev/test, you should get 10500 sentences in the test split as you have 15 unseen relation ids.

Using your code, we get fewer sentences on the splits. I tested with seed 12321, and there are 200 sentences missing on the test split.

Please fix this issue and re-evaluate the results for the main paper.

Answer 1 · 2022-09-27T03:11:56.000Z

Hi, the reason for fewer samples is that some samples have the same text, hence they are merged to form the multi-triplet sentences.

Answer 2 · 2022-10-17T15:13:31.000Z

The multi-triplet sentences are in the data, but for prediction, it is important to use a multi-eval mode on sentences with multiple triplets.