Expand functionality to different word embedding files
Opened this issue · 2 comments
Although there is a read.wordvectors
function that can read in a plan text file with vectors, the predict.word2vec
function only works on 'model' objects, that can not be created from these word vector files.
Would it be possible to have the predict.word2vec
function work on only the embedding matrix? This way, it would be possible to use it for all types of word vector models, e.g. trained with fasttext.
predict.word2vec is exactly the same as function word2vec_similarity, which you can apply on 2 embedding matrices or vectors.
- That will work on embeddings trained with this package as training is optimised for that similarity
- but this might not be what you want if you have embeddings trained in another framework.
That being said apply word2vec_similarity and see if it works for your embeddings
Note that if you need embedding models with subwords, you might as well use sentencepiece_download_model from the sentencepiece R package. This downloads sentencepiece tokenizers alongside the embedding model trained on wikipedia. Compatible with this R package