MartinoMensio/spacy-dbpedia-spotlight

Error comes with some short phrases

Benja1972 opened this issue · 3 comments

I have strange error for this small example

import spacy
nlp = spacy.blank('en')
nlp.add_pipe('dbpedia_spotlight', config={'confidence': 0.4})

txt  = 'one must keep the working memory footprint'
doc = nlp(txt)

Error which I get is as follow,

ValueError: [E1010] Unable to set entity information for token 5 which is included in more than one span in entities, blocked, missing or outside.

Hi @Benja1972 ,
Thanks for opening the issue. What is happening in your case, is that with your inputs DBPedia spotlight is finding two entities which are overlapping in terms of words:

  • working memory: http://dbpedia.org/resource/Memory_footprint
  • memory footprint: http://dbpedia.org/resource/Memory_footprint
    SpaCy, on the other side, does not allow overlapping spans to be set in the doc.ents and therefore throws this error.
    What you can do in this case is to turn off the default option overwrite_ents of this library, which avoids this exception to be raised. The outputs of dbpedia spotlight will only be saved in doc.spans['dbpedia_spotlight'] (or in another span group which can be customised by passing the config argument span_group when initiating the pipeline stage.
import spacy
txt  = 'one must keep the working memory footprint'
nlp = spacy.blank('en')

# disable overwriting the doc entities
nlp.add_pipe('dbpedia_spotlight', config={'confidence': 0.4, 'overwrite_ents':False})
doc = nlp(txt)
print(doc.spans['dbpedia_spotlight'])


# additionally you can also specify the name of the span group to be used
nlp = spacy.blank('en')
nlp.add_pipe('dbpedia_spotlight', config={'confidence': 0.4, 'overwrite_ents':False, 'span_group': 'foo'})
doc = nlp(txt)
print(doc.spans['foo'])

Let me know if this works for you!

Best,
Martino

Hi @MartinoMensio ,
Thank you for clarification. I will try this approach.

Best regards
Sergei

@Benja1972 this is now also solved by #8 without requiring extra configuration.

Best,
Martino