/torch2vec

A PyTorch implementation of w2v

Primary LanguagePython

A PyTorch implementation of skipgram W2V with negative sampling.

The utils.py file stores all the different functions that will be used for I/O and other procedures on the corpus.

The SkipW2V.py file implements the W2V-skipgram architecture with negative sampling.

The main.py file is used for training the algorithm.

Example commands:

Training:

python ./main.py -c ../data/1bwc50000.txt -w 2 -min 0 -ll 50000 -tsize 10000 -nex 5 -opt sgd -e 15

Testing with the word "man":

python ./main.py --train_test test -words man

Articles used:

General papers & notes:

Misc.:

To do:

Optimize passes over the data.

  • Implement subsampling when reading the corpora
  • Discard words that do not meet the min_count.
  • Implement random batching of data
  • Implement an independent testing suite?