gdp

gdp is generating distributed representation code sets written by pytorch.

This code sets is including skip gram and cbow.

Installation

Dependencies

gdp requires:

python 3.6.6
pytorch-cpu 1.0.0
numpy 1.15.4
tqdm 4.28.1

User installation

You can install gdp running the following commands.

pip install git+https://github.com/RottenFruits/gdp

Example

skip gram

This is example that run simple skip gram.

from gdp import distributed_representation as dr
from gdp import corpus as cp

data = [
    'he is a king',
    'she is a queen',
    'he is a man',
    'she is a woman',
    'warsaw is poland capital',
    'berlin is germany capital',
    'paris is france capital',
]

corpus = cp.Corpus(data = data, mode = "a", max_vocabulary_size = 5000, max_line = 0, 
                   minimum_freq = 0)

window_size = 1
embedding_dims = 30
batch_size = 128

dr_sg = dr.DistributedRepresentation(corpus, embedding_dims, window_size, batch_size, 
                                       model_type = "skip-gram", ns = 0, trace = True)
dr_sg.train(num_epochs = 101, learning_rate = 0.05)

skip gram with negative sampling

If you want to use negative sampling is this.

dr_sgns = dr.DistributedRepresentation(corpus, embedding_dims, window_size, batch_size, 
                                       model_type = "skip-gram", ns = 1, negative_samples = 5, trace = True)
dr_sgns.train(num_epochs = 101, learning_rate = 0.05)

etc

If you want to use cbow architecture, you should replace model_type "skip-gram" to "cbow".

And more example code is in example directory, please check it too.

Distributed representations

gdp inclues:

skipgram
skipgram with negative sampling
cbow
cbow with negative sampling

Reference

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality.
Implementing word2vec in PyTorch (skip-gram model)
fanglanting/skip-gram-pytorch
tensorflow/tensorflow/examples/tutorials/word2vec/word2vec_basic.py