declare-lab/RelationPrompt

Bug in Data Splitting on FewRel Dataset

SaeedNajafi opened this issue · 2 comments

Hey,
The fewrel dataset has 700 sentences per relation id.

After splitting the FewRel into train/dev/test, you should get 10500 sentences in the test split as you have 15 unseen relation ids.

Using your code, we get fewer sentences on the splits. I tested with seed 12321, and there are 200 sentences missing on the test split.

Please fix this issue and re-evaluate the results for the main paper.

Hi, the reason for fewer samples is that some samples have the same text, hence they are merged to form the multi-triplet sentences.

The multi-triplet sentences are in the data, but for prediction, it is important to use a multi-eval mode on sentences with multiple triplets.