qurator-spk/neat

lines are being cut off (after saving?)

snmnzl opened this issue · 10 comments

We noticed that ending parts of the plain text (OCR) of the newspaper page in the example.tsv were missing after saving the annotation result of each session (see different states of the file attached). We couldn't figure out the pattern yet - our best guess is, that its connected to the split/merge function.
example.tsv_states.zip

There seems to be a problem with lines breaks. For instance in 1Annotation.tsv at line 45, there is a line break that is not supposed to exist.

Right, this happens regularly in all our files and we did not find a source for this error. In neath, the display of the lines is correct.

cneud commented

fixed in a1bd8af

@cneud Unfortunately, we still notice different line numbers after editing. The unaltered original file 27646518_1892-07-05_21_335_005.tsv has 4887 lines. After editing (see file attached), my file is left with 4811 lines. The last three tokens are not the same as in the original.
27646518_1892-07-05_21_335_005_edit.zip

cneud commented

@snmnzl Does this issue still occur with the current version of neat?

@cneud We still encounter this problem on a regular basis: the last files I edited were complete in NEAT and were saved. By the time they were either uploaded or reopened they had lines missing.

I attached two files to demonstrate: first one is the first draft of the master data (ending in _M) which when opened in NEAT ends correctly at L 2351. A change has been made to a set of tokens (5 tags were removed) and the data has been saved (file ending in _MM) and the resulting file ends at L 2346.
Creating_Masterdata_2436020X_1897-05-07_0_212_002.zip

cneud commented

@JZinck Ouch, too bad this apparently still occurs. Thanks for the extra information and example data. We will investigate the cause of the error.

Just to confirm, the file was only ever opened in neat and not in any other editor/tool?

@cneud Correct. We have been very careful to only open them in neat.

I finally found the cause for that one.
The problem did only occur if you would edit the tags via the accordion menu.
It would not show up if you perform the editing by means of the hot keys.

The root cause was a missing trim() on a string that resulted in invalid line breaks being inserted into the tsv file.

@labusch Sounds great, thx! We will get back to you in case it still occurs, since we are constantly using the accordeon menu it would show up quickly. But for now we take this as solved!