microsoft/presidio

96c word is being incorrectly identified as PERSON

bhanu-pappala opened this issue · 1 comments

Describe the bug
96c word is being incorrectly identified as PERSON if the query is like "what is letter 96c".
It is fine if I remove letter word.

Expected behavior
what is letter 96c

Screenshots
image
Additional context
Model being used is spacy encore web lg
Package used: presidio-analyzer
Tried versions 2.2.351, 2.2.354.

Each NER model could have false positives. Consider looking into other models, such as those coming from huggingface or flair. Our demo website allows you to easily experiment with a few selected models, and the documentation has details on how to integrate models other than spaCy en_core_web_lg