Load pretrained word embeddings (word2vec, glove format) into torch.FloatTensor for PyTorch
PyTorch required. Please follow the official site installation guide.
pip install cffi
pip install torchwordemb
import torchwordemb
read word2vec binary-format model from path
.
returns (vocab, vec)
vocab
is adict
mapping a word to its index.vec
is atorch.FloatTensor
of sizeV x D
, whereV
is the vocabulary size andD
is the dimension of word2vec.
# vocab, vec = torchwordemb.load_word2vec_bin("/path/to/word2vec/model.bin")
vocab, vec = torchwordemb.load_word2vec_bin("./resource/word2vec.test.bin")
print(vec.size())
print(vec[vocab["user"]])
# print(vec[ w2v.vocab["apple"] ] )
read word2vec text-format model from path
.
import torchwordemb
# load FT format (.vec)
vocab, vec = torchwordemb.load_word2vec_text("/path/to/file.vec")
print(vec.size())
print(vec[vocab["老鼠會"]])
read GloVe text-format model from path
.