gdp is generating distributed representation code sets written by pytorch.
This code sets is including skip gram and cbow.
gdp requires:
- python 3.6.6
- pytorch-cpu 1.0.0
- numpy 1.15.4
- tqdm 4.28.1
You can install gdp running the following commands.
pip install git+https://github.com/RottenFruits/gdp
This is example that run simple skip gram.
from gdp import distributed_representation as dr
from gdp import corpus as cp
data = [
'he is a king',
'she is a queen',
'he is a man',
'she is a woman',
'warsaw is poland capital',
'berlin is germany capital',
'paris is france capital',
]
corpus = cp.Corpus(data = data, mode = "a", max_vocabulary_size = 5000, max_line = 0,
minimum_freq = 0)
window_size = 1
embedding_dims = 30
batch_size = 128
dr_sg = dr.DistributedRepresentation(corpus, embedding_dims, window_size, batch_size,
model_type = "skip-gram", ns = 0, trace = True)
dr_sg.train(num_epochs = 101, learning_rate = 0.05)
If you want to use negative sampling is this.
dr_sgns = dr.DistributedRepresentation(corpus, embedding_dims, window_size, batch_size,
model_type = "skip-gram", ns = 1, negative_samples = 5, trace = True)
dr_sgns.train(num_epochs = 101, learning_rate = 0.05)
If you want to use cbow architecture, you should replace model_type
"skip-gram" to "cbow".
And more example code is in example directory, please check it too.
gdp inclues:
- skipgram
- skipgram with negative sampling
- cbow
- cbow with negative sampling
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality.
- Implementing word2vec in PyTorch (skip-gram model)
- fanglanting/skip-gram-pytorch
- tensorflow/tensorflow/examples/tutorials/word2vec/word2vec_basic.py