DSKSD/DeepNLP-models-Pytorch

Why Skip-gram models need 2 embedding layer ?

yu45020 opened this issue · 0 comments

Hi SungDong. Thanks for the great posts. I am reading the first two models on skip-gram. Why do you use two embedding instead of one? The second embedding_u has all the same weights for each row after I train it. Based on the formula on this model, I think it should have only one embedding for all word vectors. Am I missing some details ?


Is the second matrix used for efficiency ? I guess the second matrix can be replace by a linear transformation with the transpose size. But the prediction is a one-hot vector, so it is a waste to compute bunches of zeros. A matrix look up is far more efficient.

class Skipgram(nn.Module):
    
    def __init__(self, vocab_size, projection_dim):
        super(Skipgram,self).__init__()
        self.embedding_v = nn.Embedding(vocab_size, projection_dim)
        self.embedding_u = nn.Embedding(vocab_size, projection_dim)

        self.embedding_v.weight.data.uniform_(-1, 1) # init
        self.embedding_u.weight.data.uniform_(0, 0) # init
        #self.out = nn.Linear(projection_dim,vocab_size)