lines are being cut off (after saving?)
snmnzl opened this issue · 10 comments
We noticed that ending parts of the plain text (OCR) of the newspaper page in the example.tsv were missing after saving the annotation result of each session (see different states of the file attached). We couldn't figure out the pattern yet - our best guess is, that its connected to the split/merge function.
example.tsv_states.zip
There seems to be a problem with lines breaks. For instance in 1Annotation.tsv at line 45, there is a line break that is not supposed to exist.
Right, this happens regularly in all our files and we did not find a source for this error. In neath, the display of the lines is correct.
@cneud Unfortunately, we still notice different line numbers after editing. The unaltered original file 27646518_1892-07-05_21_335_005.tsv
has 4887 lines. After editing (see file attached), my file is left with 4811 lines. The last three tokens are not the same as in the original.
27646518_1892-07-05_21_335_005_edit.zip
@cneud We still encounter this problem on a regular basis: the last files I edited were complete in NEAT and were saved. By the time they were either uploaded or reopened they had lines missing.
I attached two files to demonstrate: first one is the first draft of the master data (ending in _M) which when opened in NEAT ends correctly at L 2351. A change has been made to a set of tokens (5 tags were removed) and the data has been saved (file ending in _MM) and the resulting file ends at L 2346.
Creating_Masterdata_2436020X_1897-05-07_0_212_002.zip
@JZinck Ouch, too bad this apparently still occurs. Thanks for the extra information and example data. We will investigate the cause of the error.
Just to confirm, the file was only ever opened in neat and not in any other editor/tool?
I finally found the cause for that one.
The problem did only occur if you would edit the tags via the accordion menu.
It would not show up if you perform the editing by means of the hot keys.
The root cause was a missing trim() on a string that resulted in invalid line breaks being inserted into the tsv file.