microsoft/presidio

How to change the default 0.85 score for `SpacyRecognizer`?

lifepillar opened this issue · 1 comments

I have tried this with Presidio 2.2.354:

from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import SpacyRecognizer


custom_recognizer = SpacyRecognizer(ner_strength=0.25)

analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(custom_recognizer)

results = analyzer.analyze(
    text="Alice and Bob", language="en", return_decision_process=True, score_threshold=0.1
)

print(results)
print("------")
print([res.__dict__ for res in results])

The assigned score is always 0.85. How can I change that?

My goal is to define multiple SpacyRecognizers and control which takes precedence over which. At the moment, if two entities overlap, the larger one wins, or ties are resolved arbitrarily if the spans are the same. Am I missing something?

Hi, please see the following code snippet:

from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import SpacyNlpEngine, NerModelConfiguration

# Define which model to use
model_config = [{"lang_code": "en", "model_name": "en_core_web_lg"}]

ner_model_configuration = NerModelConfiguration(default_score = 0.6)

# Create the NLP Engine based on this configuration
spacy_nlp_engine = SpacyNlpEngine(models= model_config, ner_model_configuration=ner_model_configuration)

analyzer = AnalyzerEngine(nlp_engine=spacy_nlp_engine)
analyzer.analyze(...)

Using the NerModelConfiguration class you can further configure which entities the model returns, how they map to Presidio's entities and more.

https://microsoft.github.io/presidio/analyzer/nlp_engines/spacy_stanza/#how-ner-results-flow-within-presidio