<unk> in the second notebook

Question

<unk> in the second notebook

pietz opened this issue 4 years ago · 3 comments

Could somebody explain why we manually change the unk_init from zeros to a normal distribution and afterwards overwrite the the weights in the embedding layer from the normally distributed weights to zero? This seems redundant.

Answer 1 · 2020-04-07T10:45:32.000Z

The unk_init isn't just used to set the initial embedding of the <unk> token, it is used to set the initial embedding for every token within your vocabulary that is not in your pre-trained embeddings, e.g. the word "bananas" is in your vocabulary but not in your pre-trained embedding then the embedding for "bananas" will be initialized from whatever you specify your unk_init function to be.

Saying that, I was told it was good practice to initialize the <unk> token to all zeros, but I don't believe this is the case anymore and I think only the <pad> token should be initialized to zeros - although from experimenting it makes basically no difference in the final results.

Answer 2 · 2020-04-07T14:11:06.000Z

Thanks @bentrevett. That makes perfect sense. I'm not an NLP expert but I would also question why zero vectors are the default choice for unknown words.

Answer 3 · 2020-04-07T14:34:54.000Z

I am not sure either. The only related work I am aware of is this, which tries different embedding initialization techniques and finds that there's not much difference between zeros, Xavier, He, N(0, 0.1), N(0, 0.01) and N(0, 0.001) - see table 1 on page 4.

However their experiments are more focused on initializing all of the embeddings and not just those outside of the pre-trained embedding vocabulary, which they initialize with N(0, 0.01) - see last paragraph of section 3.1.