An explanation towards the coverage scorer
Closed this issue · 2 comments
Hey Pengshai,
Good questions, there are a few modifications to the coverage model from the Summary Loop paper that are not explained in detail in the Keep it Simple paper.
In particular there are two parameters:
-
is_soft
which works in the following way. In the Summary Loop, we used a "hard" version of the coverage: if the top-1 predicted word was the correct word, the model would get 1, otherwise it would get 0. The "softer" version looks at the probability assigned to the top-1 word (call it P_{T1}), and then the probability assigned to the correct word (P_C) and assigns a "soft score" of P_C / P_{T1}. This way, the score can take a range of value between [0,1] making it "soft". From experiment on both Summarization and Simplification, I encourage using the soft coverage in all situations. -
normalize
: this parameter was by default True in the Summary Loop, but we made it into a parameter in this version to allow more control. With this parameter enabled, two coverage scores are computed (one with the summary/generated text, and one with the empty string as a replacement), and the difference of the first minus the second becomes the normalized score. The intent is to remove the words that are trivially guessable, which inflate the coverage score. I also recommend using this in most cases, we didn't enable it in the final versions of KiS because we are using a much higher masking rate, and the empty string coverage tended to be close to zero, so the normalization had little effect. (Removing it made the coverage score twice as fast). My guess is it would have little effect in this situation, but I recommend enabling it for summarization.
Let me know if you have other questions, I've experimented quite a bit with the model, and I'd love to discuss further!
Thanks for the explanation!