Yale-LILY/SummEval

Punctuation not consistent for M19 (Text Summarization with Pretrained Encoders)

Opened this issue · 3 comments

It appears that there are no periods at the end of summaries (and possibly also sentences) in the outputs of bertsumextabs.

Is this a feature of the model?
Could this impact the reported scores if it's a preprocessing/postprocessing error?

Verifying with the original output: https://github.com/nlpyang/PreSumm, it seems like the model uses <q> tokens for separation between sentences in the decoded outputs.

So this seems to be problem in the processing of the model outputs which could affect the scores you report.

I found a temporary solution: the double space seems to be indicating the missing period.

Hi, thank you for pointing it out! And sorry for the delay in following up. You are correct that the double space corresponds to the missing period. It looks like the file we received for that model was without the period. For the reported scores, we use bertsum-abs, so this doesn't affect the tables, but I will follow up to insert the periods in those files.