g8a9/ferret

Definition of ` discrete_expl_th_token_ids` if `removal_args["remove_tokens"] == False`

Closed this issue · 1 comments

phiwi commented
  • ferret version: 0.4.1
  • Python version: 3.9
  • Operating System: Linux

Description

When you define sample[id_top] = self.tokenizer.mask_token_id (line 229) in ferret/evaluators/faithfulness_measures.py, shouldn't there the non-id_top tokens been masked out (as we're computing sufficiency at this point) such that code should be altered to

sample[~id_top] = self.tokenizer.mask_token_id # adding the tilde to exert negation

?

Thank you for noticing it!
Yes, in the case of using the mask token for removal rather than removing the word (i.e., when removal_args["remove_tokens"] == False) for sufficiency we want to mask the tokens not in the 'id_top', so that we preserve just the most important tokens.

We fix it with #28!
It will also be available in the next release of ferret.