discontinuous text-bound annotations not working
zeljkobekcic opened this issue · 4 comments
Hello,
I ran into a problem with the regex which parses entities (https://github.com/Yevgnen/pybrat/blob/main/pybrat/parser.py#L110).
The other regexes are affected by this issue too.
Brat (from v1.3) allows a semicolon if the annotation continues in the next line.
The official documentation provides an example for this https://brat.nlplab.org/standoff.html.
Changing the regex to something like:
(?P<id>T\d+)\t(?P<type>[^ ]+)\ (?P<start>\d+)\ (?P<end>\d+)(;\d+\ \d+)*?\t(?P<mention>.+)
Here I added an optional group which captures the other two indices.
will accept the test string T1 Location 0 5;16 23 North America
from the example.
Hi, thanks for reporting the issue and the fix. Can you check if this fix work for you? Note that since there may be multiple spans for an entity, the .start
and .end
field will now only work continuous entities.
I will report you later if it fixes it. Thank you very much for you quick reply!
It took a little longer to get back to it, but it looks like it's working!
Thank you very much!