liuwei1206/LEBERT

sentence 中字匹配到词的mask问题

seyoulala opened this issue · 3 comments

for idy in range(sent_length):
now_words = matched_words[idy]
now_word_ids = self.word_vocab.convert_items_to_ids(now_words)
matched_word_ids[idy][:len(now_word_ids)] = now_word_ids
matched_word_mask[idy][:len(matched_word_ids)] = 1

这里 matched_word_ids 的长度为max_seq_len,matched_word_mask[idy] 的长度为 max_num_word , max_seq_len >max_num_word,这样不管当前字是否匹配到词,pad位置在计算attention的时候不都是参与计算么?

Hi,

The length of matched_word_ids is not always max_seq_len but is dynamic. It is the number of matched words from lexicon but not larger than max_seq_length. So if the mask = 0, the value will not be calculated by the attention.

我看了代码 max_seq_length是一个固定的最大长度,也跑了代码 发现没有匹配到词的mask也是1。matched_word_mask[idy][:len(matched_word_ids)] = 1 换成 matched_word_mask[idy][:len(now_word_ids)] = 1 感觉才是正确的

Hi,

Yes, you are right. The code should be written to matched_word_mask[idy][:len(now_word_ids)] = 1. This is not the original code, so it may have some errors during the rewrite(for simplifaction).

Really thanks for your correction!