Inconsistent number of entities with confidence=0
martingerlach opened this issue · 2 comments
Hi,
I think there might be an issue when setting the confidence-threshold to 0 following the documentation:
nlp = spacy.blank("en")
nlp.add_pipe('dbpedia_spotlight', config={'confidence': 0.})
The result seems to yield fewer entities than there should be.
As an example, I looked at the following sentence from a randomly chosen Wikipedia article:
Montgomery (Monty) Buell is the former chair of the Department of History and Philosophy at Walla Walla University in College Place, Washington, as well as current Professor of History.
Querying the dbpedia-spotlight API directly with confidence=0, yields 11 entities (example-query).
When using the spacy-wrapper with config={'confidence': 0.}
I get only 3 entities.
In contrast, I get 11 entities when using either config={'confidence': 0.0001}
or config={'confidence': "0"}
This seems to be a bug or maybe the documentation should be clear that the confidence should be specificied as a string?
Hi @martingerlach,
Thank you very much for opening this issue! I reproduced it, and it's due to the checks I'm doing to see if the confidence parameter is set or not. I have not considered the edge case properly where the 0.
is indeed not "truthy". So it simply gets ignored. I will push a new version to fix that very soon!
As you already found out, using the string version is a workaround as it becomes "truthy".
Best,
Martino
Hi again @martingerlach,
The issue has been resolved in version v0.2.3.
It now doesn't matter whether you specify the parameters as strings or floats as the spotlight API digests them in any form.
Please update with pip install spacy-dbpedia-spotlight --upgrade
and let me know if you have further issues.
Martino