adamchristiansen/text-summarizer

Relevance measure duplicate sentences

adamchristiansen opened this issue · 2 comments

The relevance measure strategy produces duplicate sentences for certain configurations. The cause of this needs to be determined.

Weightings like augmented assign a non-zero weight to every term. This means that checking if the term is zero is not the way to determine if the term is in a sentence.

The easiest way to solve this is to make a second reference matrix. The reference matrix would be constructed using a non-normalizing binary weighting so that its values are 1 if the term (row) is in its sentence (column) and 0 otherwise. This is is used to determine which words in are present in a sentence so that they can be selectively eliminated from all other sentences without clobbering the entire sentence when matrix-wide non-zero weightings like augmented are used.