eds.negation regex matches on "une" preceding entities which include another negation subtoken
cvinot opened this issue · 0 comments
cvinot commented
Description
The negation "preceding_regex" is currently
r"ne(?=[ \n]*(?:\w*[ \n]*){3}(?:pas|point|ni|aucun|jamais|rien))"
which matched on patterns such as:
"Situation compliquée d’une neutropénie fébrile aggravée."
"Le patient est traité d'une cure d'ALECTINIB depuis le ..."
because of the "ne" and "NI" in entities.
I fixed this thanks to your customizable config but figured i'd give a heads up.
line 104 in patterns.py
preceding_regex = [
# ne (up to 3 words separated by spaces or newlines) pas/point/...
r"\bne\b(?=[ \n]*(?:\w*[ \n]*){3}(?:pas|point|ni|aucun|jamais|rien))"
]
How to reproduce the bug
add in test_negation.py line 32:
"Situation aggravée par une <ent negated=false>neutropénie fébrile</ent>."
"Le patient est traité d'une cure d'<ent negated=false>ALECTINIB</ent> depuis le ..."
run your pytest