aphp/edsnlp

eds.negation regex matches on "une" preceding entities which include another negation subtoken

cvinot opened this issue · 0 comments

Description

The negation "preceding_regex" is currently
r"ne(?=[ \n]*(?:\w*[ \n]*){3}(?:pas|point|ni|aucun|jamais|rien))"
which matched on patterns such as:
"Situation compliquée d’une neutropénie fébrile aggravée."
"Le patient est traité d'une cure d'ALECTINIB depuis le ..."
because of the "ne" and "NI" in entities.

I fixed this thanks to your customizable config but figured i'd give a heads up.

line 104 in patterns.py

preceding_regex = [
    # ne (up to 3 words separated by spaces or newlines) pas/point/...
    r"\bne\b(?=[ \n]*(?:\w*[ \n]*){3}(?:pas|point|ni|aucun|jamais|rien))"
]

How to reproduce the bug

add in test_negation.py line 32:

"Situation aggravée par une <ent negated=false>neutropénie fébrile</ent>."
"Le patient est traité d'une cure d'<ent negated=false>ALECTINIB</ent> depuis le ..."

run your pytest