nert-nlp/streusle

Format extension: incorporating annotator notes?

Opened this issue · 1 comments

The version of STREUSLE in Xposition contains some annotator notes on P tokens that are not included in the official release. The notes can help clarify the interpretation of the text, provide the annotator's rationale, or help cluster different usages at a finer level of granularity than the supersenses.

Should the .conllulex format have a place for these? An extra column? Or maybe a sentence header row, as they are rare?

Should there also be a standard for releasing rich annotation history metadata (such as who annotated which token, original vs. adjudicated annotations, timestamps, ...)?

Maybe notes should be in a standoff TSV format (similar to tquery.py output) that gets ingested into the JSON?

Distinguish token notes (tnote), lexical expression notes (lnote), sentence notes (snote)?

Allow notes for arbitrary subsets of a sentence's tokens (e.g. "this was considered but rejected as an MWE")?