graykode/nlp-tutorial

Some problems about Bert

tfighting opened this issue · 2 comments

line 70: index = randint(0, vocab_size - 1) # random index in vocabulary.
I think the replace index can't involve 'cls' ,'sep' and 'mask'!

line 70: index = randint(0, vocab_size - 1) # random index in vocabulary.
I think the replace index can't involve 'cls' ,'sep' and 'mask'!

Yes, it`s right. so the code should change like this :

if random() < 0.8:  # 80%
    input_ids[pos] = word_dict['[MASK]']  # make mask
elif random() > 0.9:
    index = randint(0, vocab_size - 1)
    while index < 4: # cause {'[PAD]': 0, '[CLS]': 1, '[SEP]': 2, '[MASK]': 3} are all  meanless
        index = randint(0, vocab_size - 1)
    input_ids[pos] = index

How about just :
index = randint(4, vocab_size - 1)