Blanks changed because of unmatched rules
unhammer opened this issue · 1 comments
Possibly related to #80 , blanks are changed depending on unmatched rules.
b.rtx:
gender = m f nt ut un fn mf xpst xpsts xpsto xcomp xsup acr GD ;
number = sg pl sp ND ;
defnes = def ind ;
case = nom acc gen ;
a_det = dem rel qnt pos emph itg ;
a_clb = clb ;
sent: _.a_clb;
det: _.a_det.gender.number.case;
DP: _.gender.number.defnes;
S: _;
DP -> "DP ~> det" %det { %1 } ;
S ->
"3." det.qnt sent.clb.remspc { %1 2 }
! | "2007:" det.qnt sent { %1 2 }
;
Note the space added which wasn't in input:
$ rtx-comp b.rtx b.rtx.bin
$ echo '^2007<det><qnt><un><pl><date>/2007<det><qnt><un><pl><date>$^:<sent><clb>/:<sent><clb>$' | rtx-proc b.rtx.bin
^2007<det><qnt><un><pl><date>$ ^:<sent><clb>$
Now uncomment a rule that matches the sequence and force-removes the space:
$ tr -d '!' < b.rtx >c.rtx
$ rtx-comp c.rtx c.rtx.bin
$ echo '^2007<det><qnt><un><pl><date>/2007<det><qnt><un><pl><date>$^:<sent><clb>/:<sent><clb>$' | rtx-proc c.rtx.bin
^2007<det><qnt><un><pl>$^:<sent><clb>$
but the problem is also "fixed" if you drop the whole S rule with the non-matching (or partially matched) det sent sequence
So the issue is that when we're inside a rule and the user writes { 1 _ 2 }
, they almost certainly want an actual space there, so the code currently doesn't put empty blanks on the output queue to prevent that. Unfortunately, it currently can't tell the difference between blanks between partial trees and blanks within a full tree that is currently being disassembled.
The fix for this is slightly non-trivial, but I think what we want to do is record on each blank in the processor what index in the queue it corresponds to. Then in outputAll()
we first record the index of any blank that isn't part of a tree and have writeBlank()
treat the ranges between those points as mini-queues, skipping empty blanks and inserting spaces as needed.