tecoholic/ner-annotator

if we annotate any word it also consider comma at the end

Closed this issue · 3 comments

hi hope you all doing good
i upload the below text to anotater when i anotae Sotware Engineer it also consider coma that attached at the end of Word but anotation tool consider onl word.

EXPERIENCE
Software Engineer,

the indices of Software Engineer is ( 11:27) but it shows 11:28
during annotation we only consider words and not commas but when we got json annotation file with one extra index value.

Are you referring to the .json file that is exported? In that case (11:28) is correct since the ending index is 1 more than the actual index of the last character. For example, for the word 'hello' the indices would be (0:5).

Are you referring to the .json file that is exported? In that case (11:28) is correct since the ending index is 1 more than the actual index of the last character. For example, for the word 'hello' the indices would be (0:5).

Is it a predefine standard for annotation?

I believe so yes. This software was initially created for spaCy, which uses the same indexing format (see here).