MartinoMensio/spacy-dbpedia-spotlight

Control confidence and support

Benja1972 opened this issue · 3 comments

Hi Martino,

Thank you for nice package.
I would suggest to have a control of confidence and support for DBpedia Spotlight though package parameters. Also make possibility use "candidates" endpoint instead "annotate" for API.

Best
Sergei

Hi @Benja1972,
Thank you very much for your suggestion! I will add control of the confidence and support parameters that are available through the DBpedia API.
Also to switch from the annotate to the candidates endpoint (or also the spot) would be done with a similar parameter.
Does it make sense to usually set them only once when the pipeline is created?

I see there that also three other parameters exist: types, sparql and policy. Do you think they may be useful?
And if they are, would it make sense to set them for the whole nlp object or to customise it for single documents?

The configuration would work in the following way:

import spacy
nlp = spacy.blank('en')
sentence = 'Joe Biden is the president of the United States'

# set the parameters when the pipeline stage is created
nlp.add_pipe('dbpedia_spotlight', config={'confidence': 0.6, 'types': 'DBPedia:Person'})
doc = nlp(sentence)
# this will output only Joe Biden
print(doc.ents)

# if I want to change the types filter, I need to instantiate a new nlp or to change the parameter inside (not very clean)
nlp.get_pipe('dbpedia_spotlight').types = 'DBpedia:Place'
# then to see the updated result, I need to create a new doc
doc = nlp(sentence)
print(doc.ents)

# I cannot use the same doc object because entities are computed at document creation

Would this workflow be good for you?

If you plan to change parameters dynamically (e.g., every document has different confidence/types filter/...) maybe a more elegant solution is necessary (instead of changing the attributes of the pipeline stage every time).

Let me know what you think.

Best,
Martino

Hi @Benja1972,
you can now use candidates endpoint and control the confidence and support parameters.

import spacy
nlp = spacy.blank('en')
nlp.add_pipe('dbpedia_spotlight', config={'process': 'candidates', 'confidence': 0.7, 'support': 50000})

Best,
Martino

Thank you @MartinoMensio
It is very clear for me. I like you implementation to place parameters on initiation of nlp model. I did play a lot with Spotlight yet and don't see examples of different annotations on document level.
Best
Sergei