Parse error of Italian
Closed this issue · 1 comments
gifdog97 commented
I used Italian model for predicting the dependency tree and obtained following result:
1 Il il DET RD Definite=Def|Gender=Masc|Number=Sing|PronType=Art 2 det _
2 termine termine NOUN S Gender=Masc|Number=Sing 8 nsubj:pass _ _
3 " " PUNCT FB _ 4 punct _ _
4 Tathāgata Tathāgata PROPN SP _ 2 nmod _ _
5 " " PUNCT FB _ 4 punct _ _
6 può potere AUX VM Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 8 aux _
7 essere essere AUX VA VerbForm=Inf 8 aux:pass _ _
8 letto leggere VERB V Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _
9 come come ADP E _ 11 case _ _
10 " " PUNCT FB _ 11 punct _ _
11 tathā-gata tathā-gata NOUN S Gender=Fem|Number=Sing 8 obl _ _
12 " " PUNCT FB _ 11 punct _ _
13 o o CCONJ CC _ 16 cc _ _
14 come come ADP E _ 16 case _ _
15 " " PUNCT FB _ 16 punct _ _
16 Tathā-āgata Tathā-āgata PROPN SP _ 11 conj _ _
17 " " PUNCT FB _ 16 punct _ _
18 , , PUNCT FF _ 16 punct _ _
19 dove dove ADV B _ 22 advmod _ _
20 il il DET RD Definite=Def|Gender=Masc|Number=Sing|PronType=Art 21 det _
21 primo primo ADJ NO Gender=Masc|Number=Sing|NumType=Ord 22 nsubj _ _
22 significa significare VERB V Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 16 acl:relcl _ _
23 " " PUNCT FB _ 25 punct _ _
24 così così ADV B _ 25 advmod _ _
25 andato andare VERB V Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 22 xcomp _
26 " " PUNCT FB _ 25 punct _ _
27 mentre mentre CCONJ CC _ 30 cc _ _
28 il il DET RD Definite=Def|Gender=Masc|Number=Sing|PronType=Art 29 det _
29 secondo secondo ADJ NO Gender=Masc|Number=Sing|NumType=Ord 30 nsubj _ _
30 significa significare VERB V Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 22 conj _ _
31 " " PUNCT FB _ 32 punct _ _
32 così venuto così venuto ADV B _ 30 advmod _ _
33 " " PUNCT FB _ 32 punct _ _
34 . . PUNCT FS _ 8 punct _ _
I think line 32 is invalid because it contains space within one token.
What is curious is in another sentence containing 'così venuto', these two words are regarded as separated tokens:
1 Così così ADV B _ 2 advmod _ _
2 venuto venire VERB V Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 0 root _ _
3 / / PUNCT FF _ 2 punct _ _
4 Così così ADV B _ 5 advmod _ _
5 andato andare VERB V Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part 2 conj _ _
6 . . PUNCT FS _ 2 punct _ _
Is this a bug? I'd appreciate it if you could investigate this issue.