Bug in Data Splitting on FewRel Dataset
SaeedNajafi opened this issue · 2 comments
SaeedNajafi commented
Hey,
The fewrel dataset has 700 sentences per relation id.
After splitting the FewRel into train/dev/test, you should get 10500 sentences in the test split as you have 15 unseen relation ids.
Using your code, we get fewer sentences on the splits. I tested with seed 12321, and there are 200 sentences missing on the test split.
Please fix this issue and re-evaluate the results for the main paper.
chiayewken commented
Hi, the reason for fewer samples is that some samples have the same text, hence they are merged to form the multi-triplet sentences.
SaeedNajafi commented
The multi-triplet sentences are in the data, but for prediction, it is important to use a multi-eval mode on sentences with multiple triplets.