plug sense2vec it into your spaCy pipeline
myeghaneh opened this issue · 2 comments
I want to add my own sense2vec to my own spacy model, as you wrote in documentation,
I add that to my current pipeline
[initialize.components]
[initialize.components.sense2vec]
data_path = "/path/to/s2v"
then
nlp = spacy.load("../data/ModelV05b/model-best")
nlp.add_pipe("sense2vec")
s2v.from_disk("../data/S2VFasttextV04")
it does not work , since it says that
[E090] Extension '_s2v' already exists on Doc. To overwrite the existing extension, set `force=True` on `Doc.set_extension`.
since sense2vec is`in nlp.component_names
['tok2vec',
'tagger',
'parser',
'ner',
'attribute_ruler',
'lemmatizer',
'sense2vec']
then I changed to my model
nlp = spacy.load("../data/ModelV05b/model-best")
still it does not work and it says
doc = nlp2("The testimony of the ages confirms that the motions of the planets are orbicular.")
assert doc[1:2].text == "testimony"
freq = doc[1:2]._.s2v_freq
vector = doc[1:2]._.s2v_vec
most_similar = doc[1:2]._.s2v_most_similar(3)
and it says that
AttributeError: 'NoneType' object has no attribute 'get_freq'
similar issue here
I've located the source of the issue. Here is the smallest case I can make that demonstrates it.
import spacy
s2v_path = "../s2v_old"
nlp1 = spacy.load("en_core_web_sm")
s2v = nlp1.add_pipe("sense2vec")
s2v.from_disk(s2v_path)
nlp2 = spacy.load("en_core_web_sm")
s2v = nlp2.add_pipe("sense2vec")
s2v.from_disk(s2v_path)
# Uncomment to make pass
# s2v.first_run = False
nlp1("hello world")
nlp2("hello world")
The error gets thrown when evaluating nlp2 in the init_component
call. This call tries to add all the extensions to the Doc object for the convenience s2v functions. The call succeeds if only a single pipeline is created, but the second pipeline tries to add the same extensions and fails. This can be worked around by hacking the first run internal variable on the second instance of the sense2vec component. But this is extremely hacky.
The "correct" solution here is probably to stop trying to be smart about adding the extension functions, and just always add them when the sense2vec library is available. In the case that the sense2vec is not part of the current pipeline, the ._s2v variable will be null and all the calls to the extension functions will fail.