Yale-LILY/SummEval

[Data] M0 aligned document count is much less than the orignal CNN/DM documents

La-SilverLand opened this issue · 3 comments

the original CNN/DM documents in total are 92579 + 219506 = 312,085
but the data for M0 (lead-3), the aligned documents are 11490, and duplicated ids are 38, which are much less than the original data.
can you explain why ?

Hi @La-SilverLand!

The model outputs correspond to the test split, consisting of 11490 examples, and not the entire dataset.

Does this test split use the same url list as in https://github.com/abisee/cnn-dailymail/tree/master/url_lists/all_test.txt ?
or you just randomly sample the test cases and get 11490 in total ?

Hi @La-SilverLand!

This is the standard test set so yes, it uses the URLs in that list.