/skip-gram

A simple PyTorch's implementation of skip-gram model

Primary LanguagePython

skip-gram

A simple implementation of Skip-Gram model in PyTorch.

Files organization

main.py —— the training process

model.py —— model's definition

getData.py —— pre-processing and organizing data(import torch.utils.data.DataLoader to enable batch)

text8、simtext2 —— the data files, "simtext2" is smaller.

If you encounter the problem "RuntimeWarning: divide by zero encountered in true_divide sampling_p = (np.sqrt(fre_np / 0.001) + 1) * 0.001 / fre_np", you should probably consider decreasing the value of vacabulary_size(for example 1000), because you may be using smaller dataset.

Results

The results of english text are as follow, the chinese word vectors are still be training.

task this repo CCL2017 paper
word relatedness 69.88% 69.36%
syntactic question 16.84% 54.24%
semantic question 9.59% 45.59%

References

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.

Li, Fang and Xiaojie Wang. “Improving Word Embeddings for Low Frequency Words by Pseudo Contexts.” CCL (2017).