coaxsoft/pytorch_bert

Shouldn't we add an extra True token to the inverse_token_mask when adding extra CLS token to the sentence?

ramesaliyev opened this issue · 6 comments

Hi, thank you for the article!

Here at dataset.py#L200, you are adding extra CLS token to the sentence, but no extra True token to the inverse_token_mask.

This creates an alignment issue which can be seen in your example:

masked_sentence    [[CLS], one, of, the, other, [MASK], has, ment...
masked_indices     [0, 5, 6, 7, 8, 2, 10, 11, 4825, 13, 2, 15, 16...
sentence           [[CLS], one, of, the, other, reviewers, has, m...
indices            [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,...
token_mask         [True, True, True, True, False, True, True, Fa...
is_next                                                            1

So [MASK] token is actually at the index of 5 but in the token_mask row it is marked at the index of 4 because of that. Isn't this wrong? I've checked other parts of the code, but couldn't find any code which handles this alignment issue, which becomes even more a problem when two sentences are concatenated i think.

Thanks!

Hi @ramesaliyev. Thanks for reaching us out. You're right. This is definitely an issue, inverse_token_mask should be padded by one, just as you described. Would you like to prepare a PR?

Hi, of course! Also fixed additional issue where accuracy_summary called without inverse_token_mask in Trainer.

Amusing - I spent some time tracking down this same issue. Glad to see I'm not crazy! Good catch, @ramesaliyev.

Hey I think this may not yet work. When I run the code it throws an error - I think because it is trying to add [True] to inverse_token_mask even when inverse_token_mask is None because should_mask is False.

Mabye I'm doing something wrong! In which case, apologies.

@j-standfast Ah, you're right! In my code, i slightly changed the part where inverse_token_mask is created, this is why I miss it here. I think since _pad_sentence is not doing anything with inverse_token_mask if it is False, we can simply do following changes:

from

sentence, inverse_token_mask = self._pad_sentence([self.CLS] + sentence, [True] + inverse_token_mask)

to

sentence = [self.CLS] + sentence
if inverse_token_mask is not None:
  inverse_token_mask = [True] + inverse_token_mask

sentence, inverse_token_mask = self._pad_sentence(sentence, inverse_token_mask)

I can open a PR if this is ok with everyone.

@ramesaliyev, plese do it 🚀