Yale-LILY/SummEval

Is it possible to release a table of all paired data and scoring results?

johntzwei opened this issue · 5 comments

Dear SummEval authors.

Thank you for your great dataset! I am currently writing up a metric analysis paper, and I hope to use your summarization dataset. I wanted to ask whether you already had some data compiled, and if it was possible to release a pandas dataframe or json file that would contain:

  1. Paired data
  2. Metric scores (for the 11,000~ outputs for each of the 23 systems).

This would save me a lot of time. Thank you!

Hi @johntzwei,

Thanks for your interest!
For 1) we do not plan to release the paired output. If you run the data_processing/pair_data.py script, you should be able to reproduce the data. Let us know if you run into any issues with the script, and we're more than happy to help there.

For 2) we do have a jsonl file with the scores but need to organize it before the release. I am a bit backlogged right now but will follow up on this.

Awesome, thank you! When would you be able to do this by? I am hoping to make the analysis in time for a 1/1 deadline. If you can't do it by then, I will get the scores myself. Let me know, thanks!

Sorry for being unable to get this before your deadline! Deadlines and the holidays had me backlogged. We'll upload the scores shortly and also update the paper with the camera-ready version, hopefully in the next week or so.

Closing this issue since the scores are now available here along with code for reproducing the correlations.

We ended up using your dataset and the scores you generated for me in our paper. Thank you!