spyysalo/standoff2conll

AssertionError: text mismatch and common.FormatError: b'Error verifying textbound T1 text mismatch (check encoding?)

Opened this issue · 0 comments

Hi you can see the different stacktrace when attempting to convert brat to conll format. Is there any way to resolve the following errors:

Traceback (most recent call last):
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 134, in
sys.exit(main(sys.argv))
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 124, in main
convert_directory(path, args)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 102, in convert_directory
convert_files(files, options)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 106, in convert_files
document = read_ann(fn, options)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 64, in read_ann
return Document.from_standoff(
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\document.py", line 432, in from_standoff
verify_textbounds(textbounds, text)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 204, in verify_textbounds
raise FormatError(s.encode('utf-8'))
common.FormatError: b'Error verifying textbound T1\tperson 128 135\tPatient\r: text mismatch (check encoding?): 128-135\n "lergies"\nvs. "Patient\r"'


Traceback (most recent call last):
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 201, in verify_textbounds
assert t.is_valid(text)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 44, in is_valid
assert text[self.start:self.end] == self.text,
AssertionError: text mismatch (check encoding?): 178-198
" DIAGNOSIS :
C. dif"
"s. "C. difficile colitis

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 134, in
sys.exit(main(sys.argv))
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 124, in main
convert_directory(path, args)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 102, in convert_directory
convert_files(files, options)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 106, in convert_files
document = read_ann(fn, options)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 64, in read_ann
return Document.from_standoff(
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\document.py", line 432, in from_standoff
verify_textbounds(textbounds, text)
File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 204, in verify_textbounds
raise FormatError(s.encode('utf-8'))
common.FormatError: b'Error verifying textbound T1\tproblem 178 198\tC. difficile colitis\r: text mismatch (check encoding?): 178-198\n " DIAGNOSIS :\r\nC. dif"\nvs. "C. difficile colitis\r"'