SigmaWe/DocAsRef

PageRank-based sentence weighting

forrestbao opened this issue · 0 comments

Original BERTScore uses IDF to weight tokens. When the pairwise similarity bumps to sentence-level, we need to weight sentences. So we need a new method.

The idea is similar to PageRank: the importance of a page is determined by the importance of pages that link to it.

Denote the document as $x=[x_1, x_2, \dots]$ and system summary under judging as $y= [y_1, y_2, \dots]$ (yes, 1-indexed.)

The importance of a sentence in the document

Intuition 1: A sentence is important if many other sentences can be related to it. Hence, the importance of a sentence $x_i$ is $w_i=f(d(x_i, x_1), d(x_i, x_2), \dots$. In the simplest case, $w_i = \sum_{i\not=j, j\in\mathbb{N}} d(x_i, x_j)$. $f$ can also be geometric average or entropy. When it is entropy, then an important sentence should have a high entropy -- because it is related to nearly all sentences. $d$ can be any distance measures, e.g., cosine similarity or using an MNLI model.

The importance of a sentence in the system summary

Intuition 1: The importance of a summary sentence is the weighted sum of its similarities to all document sentences weighted by the weights of document sentences computed above. Thus, the importance of the $j$-th sentence $y_j$ in the summary $v_j= \sum_{i,j\in\mathbb{N}} w_i d(x_i, y_j)$.