UniversalDependencies/UD_Arabic-PADT

format error in MISC field (vertical bar used as transliteration symbol)

jheinecke opened this issue · 0 comments

Hi,
I think there is a minor format error in the MISC field for words which contain with 'alif maddah (like in آن).
The transliteration of 'alif maddah uses the vertical bar, which should only be used as a field separator.
e.g.

sent_id = afp.20000815.0006:p3u1:
150 آن آن X U--------- _ 151 nmod 151:nmod Vform=آن|Root=OOV|Translit=|n

sent_id = ummah.20040715.0013:p7u1
47 الكلدآشوريين الكلدآشوريين X U--------- _ 42 nmod 42:nmod Vform=الكلدآشوريين|Root=OOV|Translit=Alkld|$wryyn

sent_id = afp.20000815.0110:p5u1
45 آحادي آحادي X U--------- _ 47 nmod 47:nmod Vform=آحادي|Root=OOV|Translit=|HAdy

May be you could use "ā" or "â" instead ? It is a very minor problem (32 cases in ar_padt-ud-train.conllu, 9 in test and just 1 in dev)
Thanks,
regards
Johannes