Shouldn't we add an extra True token to the inverse_token_mask when adding extra CLS token to the sentence?
ramesaliyev opened this issue · 6 comments
Hi, thank you for the article!
Here at dataset.py#L200, you are adding extra CLS
token to the sentence
, but no extra True
token to the inverse_token_mask
.
This creates an alignment issue which can be seen in your example:
masked_sentence [[CLS], one, of, the, other, [MASK], has, ment...
masked_indices [0, 5, 6, 7, 8, 2, 10, 11, 4825, 13, 2, 15, 16...
sentence [[CLS], one, of, the, other, reviewers, has, m...
indices [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
token_mask [True, True, True, True, False, True, True, Fa...
is_next 1
So [MASK]
token is actually at the index of 5
but in the token_mask
row it is marked at the index of 4
because of that. Isn't this wrong? I've checked other parts of the code, but couldn't find any code which handles this alignment issue, which becomes even more a problem when two sentences are concatenated i think.
Thanks!
Hi @ramesaliyev. Thanks for reaching us out. You're right. This is definitely an issue, inverse_token_mask
should be padded by one, just as you described. Would you like to prepare a PR?
Hi, of course! Also fixed additional issue where accuracy_summary
called without inverse_token_mask
in Trainer
.
Amusing - I spent some time tracking down this same issue. Glad to see I'm not crazy! Good catch, @ramesaliyev.
Hey I think this may not yet work. When I run the code it throws an error - I think because it is trying to add [True]
to inverse_token_mask
even when inverse_token_mask
is None
because should_mask
is False
.
Mabye I'm doing something wrong! In which case, apologies.
@j-standfast Ah, you're right! In my code, i slightly changed the part where inverse_token_mask
is created, this is why I miss it here. I think since _pad_sentence
is not doing anything with inverse_token_mask
if it is False, we can simply do following changes:
from
sentence, inverse_token_mask = self._pad_sentence([self.CLS] + sentence, [True] + inverse_token_mask)
to
sentence = [self.CLS] + sentence
if inverse_token_mask is not None:
inverse_token_mask = [True] + inverse_token_mask
sentence, inverse_token_mask = self._pad_sentence(sentence, inverse_token_mask)
I can open a PR if this is ok with everyone.
@ramesaliyev, plese do it 🚀