Incorrect flagging of non-projective punctuation in validator
Closed this issue · 6 comments
For this sentence, the validator claims that the punctuation mark node 14 is introducing non-projectivity. However, non-projective relations (introduced by the reparandum
relation) would also exist without the punctuation mark and I think the structure of this tree adheres to the punctuation guidelines.
# sent_id = n01002058
# text = What she’s saying and what she’s doing, it — actually, it’s unbelievable.
1 What what PRON WP PronType=Int 4 obj 4:obj _
2 she she PRON PRP Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs 4 nsubj 4:nsubj SpaceAfter=No
3 ’s be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 4 aux 4:aux _
4 saying say VERB VBG VerbForm=Ger 17 dislocated 17:dislocated _
5 and and CCONJ CC _ 9 cc 9:cc _
6 what what PRON WP PronType=Int 9 obj 9:obj _
7 she she PRON PRP Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Prs 9 nsubj 9:nsubj SpaceAfter=No
8 ’s be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 9 aux 9:aux _
9 doing do VERB VBG VerbForm=Ger 4 conj 4:conj:and SpaceAfter=No
10 , , PUNCT , _ 17 punct 17:punct _
11 it it PRON PRP Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 15 reparandum 15:reparandum _
12 — — PUNCT : _ 11 punct 11:punct _
13 actually actually ADV RB _ 17 advmod 17:advmod SpaceAfter=No
14 , , PUNCT , _ 17 punct 17:punct _
15 it it PRON PRP Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 17 nsubj 17:nsubj SpaceAfter=No
16 ’s be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 17 cop 17:cop _
17 unbelievable unbelievable ADJ JJ Degree=Pos 0 root 0:root SpaceAfter=No
18 . . PUNCT . _ 17 punct 17:punct _
See udapi/udapi-python#52 (comment)
BTW: ud.FixPunct fixes your sentence automatically.
But from that discussion:
Yes. Attachment of punctuation should not cause non-projectivity. It can cause non-projectivity either because the punctuation node is attached non-projectively, or because it creates a gap (containing the punctuation) due to which other dependencies become non-projective. But if there is another node in the gap and the punctuation is attached to that node, then the punctuation is not taken as the cause of the non-projectivity.
AFAICS, in this example, the punctuation is not creating a gap and I don't think it would be correct to attach the comma in 14 to actually in 13. Or am I missing something here?
I don't think attaching 14 to 13 would be wrong; in fact, it would be my preferred choice if I were annotating this sentence manually. My reason for that would be that the comma is delimiting actually from its head, hence in some sense it belongs to actually (it is there because of actually).
It was not necessarily my intention to prevent you from attaching 14 to 17. But now when I checked my code, I see that I actually assume that the tested punctuation node is attached to one of the nodes in the same gap. That is, the nonprojectivity is not reported if I find out that the parent of the current node lies in the same gap. punct(13, 14)
would thus be valid.
So the question is, are we OK with this slightly stricter constraint, or should I modify the code so that just the mere presence of another non-punctuation node in the same gap makes the punctuation valid?
The comma at position 14 is causing a non-projective gap (by definition). If you attach the comma to "actually", it is only "actually" which is causing the non-projective gap. This is exactly what Dan wrote in the quoted discussion:
But if there is another node in the gap and the punctuation is attached to that node, then the punctuation is not taken as the cause of the non-projectivity.
So if the punctuation is just in the gap as a sibling with another node, this is still considered a bug by the validator. If you attach the punctuation to another non-punctuation node in the gap, you'll make the validator happy.
In other words (or rather pictures):
─┮
│ ╭─╼ What
│ ┢─╼ she
│ ┢─╼ ’s
│ ╭─┾ saying
│ │ │ ╭─╼ and
│ │ │ ┢─╼ what
│ │ │ ┢─╼ she
│ │ │ ┢─╼ ’s
│ │ ╰─┶ doing
│ ┢─╼ ,
│ │ ╭─┮ it
│ │ │ ╰─╼ —
│ ┢─╼ actually │
│ ┢─╼ , │ <-- punct causing a non-projective gap = BUG
│ ┢────────────┶ it
│ ┢─╼ ’s
╰─┾ unbelievable
╰─╼ .
─┮
│ ╭─╼ What
│ ┢─╼ she
│ ┢─╼ ’s
│ ╭─┾ saying
│ │ │ ╭─╼ and
│ │ │ ┢─╼ what
│ │ │ ┢─╼ she
│ │ │ ┢─╼ ’s
│ │ ┡─┶ doing
│ │ ╰─╼ ,
│ │ ╭─┮ it
│ │ │ ╰─╼ —
│ ┢─┮ actually │ <-- non-punct causing a non-proj gap = OK
│ │ ╰─╼ , │ <-- punct in a gap, but not causing it = OK
│ ┢────────────┶ it
│ ┢─╼ ’s
╰─┾ unbelievable
╰─╼ .
EDIT: I did not read my own comment quoted by @sebschu carefully enough :-)
But if there is another node in the gap and the punctuation is attached to that node, then the punctuation is not taken as the cause
So I actually said it right.
Ok, yeah, this makes sense. Thanks for clarifying!