How to change the default 0.85 score for `SpacyRecognizer`?
lifepillar opened this issue · 1 comments
lifepillar commented
I have tried this with Presidio 2.2.354:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.predefined_recognizers import SpacyRecognizer
custom_recognizer = SpacyRecognizer(ner_strength=0.25)
analyzer = AnalyzerEngine()
analyzer.registry.add_recognizer(custom_recognizer)
results = analyzer.analyze(
text="Alice and Bob", language="en", return_decision_process=True, score_threshold=0.1
)
print(results)
print("------")
print([res.__dict__ for res in results])
The assigned score is always 0.85. How can I change that?
My goal is to define multiple SpacyRecognizer
s and control which takes precedence over which. At the moment, if two entities overlap, the larger one wins, or ties are resolved arbitrarily if the spans are the same. Am I missing something?
omri374 commented
Hi, please see the following code snippet:
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import SpacyNlpEngine, NerModelConfiguration
# Define which model to use
model_config = [{"lang_code": "en", "model_name": "en_core_web_lg"}]
ner_model_configuration = NerModelConfiguration(default_score = 0.6)
# Create the NLP Engine based on this configuration
spacy_nlp_engine = SpacyNlpEngine(models= model_config, ner_model_configuration=ner_model_configuration)
analyzer = AnalyzerEngine(nlp_engine=spacy_nlp_engine)
analyzer.analyze(...)
Using the NerModelConfiguration
class you can further configure which entities the model returns, how they map to Presidio's entities and more.