GrammarSoft/cg3

cg-proc loses \ before @

unhammer opened this issue · 3 comments

@ should be escaped in asf. Transfuse sends it in escaped. If it's in the morph analyser, lt-proc sends it on escaped. But cg-proc loses the \:

$ echo '^\@/\@<thing>$' |cg-proc nob-nno.rlx.bin
^@/@<thing>$

expected: ^\@/\@<thing>$

Looks like https://github.com/TinoDidriksen/cg3/blob/master/src/ApertiumApplicator.cpp#L716 is the place to add it. However, the bf_escaped[0] == '@' check would then never be true. Should escaping @ depend on surface_readings ?

Hm, we could check for bf[0] == '@' instead there, though really that check depends on there being an unescaped initial @ in the baseform we read.

Similarly, we don't want escapes to be added to ^notinbidix/@notinbidix$, only if they were there in input.
I'm guessing escapes in input baseforms are just silently dropped before they end up in reading->baseform, meaning the difference between ^\@/\@<thesymbol>$ and ^notinbidix/@notinbidix$ is lost.

Would it be possible (on parsing the stream) to store something on Cohort (or Reading) to mark it as an unknown? (Or should it be implicitly represented like "has no readings"?)

Works great, thanks =D