Configure AnalyzerEngine from file
omri374 opened this issue · 2 comments
Is your feature request related to a problem? Please describe.
In many use-cases, especially around the Docker based option, it is challenging to configure the AnalyzerEngine
for a specific scenario. For example, in order to have an API supporting multiple languages, it is required to change the code in the app.py:
presidio/presidio-analyzer/app.py
Line 40 in 4db5278
Having a way to configure which initial parameters are used (languages, nlp engine, recognizers, default score etc.) will allow a code-free configuration in both Docker based use-cases and for a more configurable Python pipeline.
Describe the solution you'd like
- Having a configuration file with all the parameters, and a utility to read it and create the custom
AnalyzerEngine
instance - Add the ability to read this conf file in the Docker file.
Describe alternatives you've considered
An alternative would be documentation of how to change app.py
, but code would still have to be changed.
Additional context
Presidio already has several conf file, e.g.:
Hey ! Thanks for this PR.
Can I use it to use an other transformer model ? Like this one : https://huggingface.co/Jean-Baptiste/camembert-ner
I was thinkings about using a conf file yaml like this :
nlp_engine_name: transformers
models:
-
lang_code: fr
model_name:
spacy: fr_core_news_lg
transformers: Jean-Baptiste/camembert-ner
ner_model_configuration:
labels_to_ignore:
- O
aggregation_strategy: simple # "simple", "first", "average", "max"
stride: 16
alignment_mode: strict # "strict", "contract", "expand"
model_to_presidio_entity_mapping:
PER: PERSON
LOC: LOCATION
ORG: ORGANIZATION
AGE: AGE
ID: ID
EMAIL: EMAIL
PATIENT: PERSON
STAFF: PERSON
HOSP: ORGANIZATION
PATORG: ORGANIZATION
DATE: DATE_TIME
PHONE: PHONE_NUMBER
HCW: PERSON
HOSPITAL: ORGANIZATION
low_confidence_score_multiplier: 0.4
low_score_entity_names:
- ID
Can this work ? Without your PR, fr
language never seems available.
Thanks.
Hi @GautierT, are you looking to run this through a REST API?
If no, then you can configure your model using the standard NlpEngineProvider
logic, for example see this documentation
If yes, then the only additional change needed is on app.py
to pass the NlpEngine
into the AnalyzerEngine
. Instead of this:
presidio/presidio-analyzer/app.py
Line 40 in 5bc4b67
Have this:
class Server:
"""HTTP Server for calling Presidio Analyzer."""
def __init__(self):
fileConfig(Path(Path(__file__).parent, LOGGING_CONF_FILE))
self.logger = logging.getLogger("presidio-analyzer")
self.logger.setLevel(os.environ.get("LOG_LEVEL", self.logger.level))
self.app = Flask(__name__)
self.logger.info("Starting analyzer engine")
provider = NlpEngineProvider(conf_file=PATH_TO_CONF)
nlp_engine = provider.create_engine()
self.engine = AnalyzerEngine(nlp_engine=nlp_engine, supported_languages=["fr"])
self.logger.info(WELCOME_MESSAGE)