explosion/sense2vec

how to do vector arithmetic?

sleepandpancakes opened this issue · 4 comments

how do i use the API to do manual vector arithmetic on vectorized words/phrases?
for example, adding an arbitrary vector to vector corresponding to a word and returning the result?
or linear interpolation between two vectorized words and converting to corresponding word?

You can obtain the vectors like this (see example in the readme):

import spacy

nlp = spacy.load("en_core_web_sm")
s2v = nlp.add_pipe("sense2vec")
s2v.from_disk("/path/to/s2v_reddit_2015_md")

doc = nlp("A sentence about natural language processing.")
vector = doc[3:6]._.s2v_vec

You can then use e. g. numpy to do whatever vector arithmetic on the embeddings you obtained.

thank you. is there a way to take an arbitrary vector and find the closest corresponding word in the vocab? i'm still having a bit of trouble understanding how i would do this

What you're looking for is a nearest neighbor search. sense2vec doesn't expose this in the public API, but there are a lot of tools for this - sorted by complexity/overhead/capabilities from low to high:

thank you again