stanfordnlp/CoreNLP

Tregex: infinite loop for `?<...`

tanloong opened this issue · 6 comments

Specifying the MULTI_RELATION as optional seems to cause an infinite loop of printing matches. I am using Tregex 4.2.0 downloaded from https://nlp.stanford.edu/software/tregex.shtml#Download and here is the command I used:

echo "(A (B 1) (C 2))" | tregex.sh "A ?<... { B ; C }" -filter

tregex_optional-multi-relation

P.S.

I noticed that the testSubtreePattern of TregexTest.java includes the <... and !<... but does not cover ?<... . So I gave it a try and encountered this issue.

Thanks, love it. A bug in an obscure syntax I added because I was tired of writing out the individual pieces by hand, using a variant of the feature probably no one will ever use. I'll try to figure out what's going on.

Interesting to note btw is that when the pattern is turned back into a tregex, it does not use the <... syntax. Basically, the <... gets expanded into the component pieces it represents.

I suspect the optional syntax in general is wrong somehow... there shouldn't be anything specific to the <..., considering your expression gets turned into a nested CoordinationPattern with two < and one !<

Haven't fully generalized this yet, but this also goes infinite:

echo "(A (B 1) (C 2))" | java edu.stanford.nlp.trees.tregex.TregexPattern "A ?(< B < C)" -filter

This does not:

echo "(zzz (A (B 1) (C 2)))" | java edu.stanford.nlp.trees.tregex.TregexPattern "A ?(< B)" -filter

So basically any optional CoordinationPattern is buggy?

... for reference, CoordinationPattern is how Tregex implements an and or an or internally

further bugginess in it is that the pattern

A ?(< B < C)

gets output without the optional:

(A < B < C )

this is probably easier to fix

well, this fixed the output of the CoordinationPattern, at least

a9965b2

Alright, figured out what's going on - basically, the CoordinationPattern in question wasn't checking if it had already returned once for an optional pattern, instead just always returned true for an optional. The result was it would just always accept new empty matches every time it checked if the tree had another match...

under the hood, in case it wasn't clear from the previous comments, <... is just a macro for match the named nodes, then check that there isn't a subsequent node at the end of the list, and do all of that in a conjunction CoordinationPattern

Yes. The A ?<... { B ; C } gets expanded to A ?[<1 B <2 C !<3 __].

Thanks for all the efforts and the concise explanation!