how to do vector arithmetic?
sleepandpancakes opened this issue · 4 comments
how do i use the API to do manual vector arithmetic on vectorized words/phrases?
for example, adding an arbitrary vector to vector corresponding to a word and returning the result?
or linear interpolation between two vectorized words and converting to corresponding word?
You can obtain the vectors like this (see example in the readme):
import spacy
nlp = spacy.load("en_core_web_sm")
s2v = nlp.add_pipe("sense2vec")
s2v.from_disk("/path/to/s2v_reddit_2015_md")
doc = nlp("A sentence about natural language processing.")
vector = doc[3:6]._.s2v_vec
You can then use e. g. numpy
to do whatever vector arithmetic on the embeddings you obtained.
thank you. is there a way to take an arbitrary vector and find the closest corresponding word in the vocab? i'm still having a bit of trouble understanding how i would do this
What you're looking for is a nearest neighbor search. sense2vec
doesn't expose this in the public API, but there are a lot of tools for this - sorted by complexity/overhead/capabilities from low to high:
- In-memory solutions like
scikit-learn
's KNN implementation - File-based solutions like
annoy
or FAISS - Vector DBs like Weaviate, Pinecone, etc.
thank you again