Yale-LILY/SummEval

Help Reproducing the Results

Closed this issue · 7 comments

Dear Authors thanks a lot for your work.
I am trying to do a follow up work and add a new metric on your librairy.
However i am facing an issue as i fight to reproduce your result (Tab.2 ) My evaluation is as follow :
for all summary generated :
score.append(metric.evaluate_batch(summary, references,agregate = True))

stat.pearson_cor(scores, target_score)
with target score beeing an average of 4 annotation score .
Am I missing something ?
Cheers

Do you have a piece of code i could look at ?

Hi @PierreColombo
Yes, that was how we initially calculated the score. When we updated the paper, we followed Louis and Nenkova (Section 3.1) and report system-level correlations. We'll have that version out on ArXiv for next Wednesday, and I'll try to provide some reference code with that release.

Hi @Alex-Fabbri , I am trying to reproduce the kendall tau correlations scores quoted in your paper. Could please provide some clarity regarding which ROUGE-1,2,3,4 metric was used (ie: wether it's precision/recall/f1 scores)
Thanks

Hi @tanay2001, we used f1 scores. I'm also attaching a file to help with reproducing the scores.

code.zip

dptam commented

Hi Alex,

Thanks for releasing the code. I downloaded the code, and noticed it took an input file to compute scores on. I passed in the human annotations you linked in the repo, but noticed the json lines were missing several keys like summ_id or metric_scores_{args.subset}? I was wondering if there was a sample input file to run system_level_correlations.py on?

Thanks

Hi Derek, I just updated the code so that it should work with the model_annotations.aligned.scored.jsonl file. Please feel free to reopen this issue if you encounter any problems!

Hi there. I would like to get the original source of each summary of CNN/DM. How can I obtain that? What I have observed is that, in the jsonl file, there are the generated summaries, but I couldn't find the actual paragraph/source.