nlpyang/PreSumm

Maximum Sentence Number in the Output

StevenLau6 opened this issue · 0 comments

Thank you for sharing the code.
I tested the extractive setting on a different summarization dataset and found there are at most 3 sentences output for each sample. It may meet the requirements of the CNN/DM dataset, but may not be suitable for other dataset, where the target summaries can be longer than 3 sentneces.
So I suggest to modify the code in trainer_ext.py#L275, and use the hyper-parameter self.args.max_tgt_len to control the length of output sequence.

if ((not cal_oracle) and (not self.args.recall_eval) and len(_pred) == 3):