Pseudoreference using decent enough summarizers
forrestbao opened this issue · 0 comments
forrestbao commented
Pseudocode:
def pseudo_metric(documents: List[str], system_summaries: List[str]):
pseudo_ref_summaries = pegasus(documents)
rouge(pseudo_ref_summaries, system_summaries)
return rouge_scores
Let's try two summarizers for now, Google's Pegasus and Facebook's BART trained on a summarization dataset.
Before we start, let's try using and not using HF's pipeline
. See whether they have use the same result. Specially, one approach is
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
while the other (doc here)is
summarizer = pipeline("summarization")
summarizer("Sam Shleifer writes the best docstring examples in the whole world.", model="bart-large-cnn", min_length=5, max_length=20)