apertium/apertium-recursive

How to match non-ascii lemmas?

Closed this issue · 4 comments

$ cat foo.rtx
ij: _;
IJ: _;

IJ ->
    %ij
    (
        if (1.lemh/sl = "a" )
        { MATCHED@a.[1.lemh/sl] }
        el-if (1.lemh/sl = "æ" )
        { MATCHED@æ.[1.lemh/sl] }
        else
        { NO_MATCH@x.[1.lemh/sl] }
    )
    ;
$ rtx-comp foo.rtx foo.bin && echo '^a<ij>/a<ij>$ ^æ<ij>/æ<ij>$' | rtx-proc -r foo.bin

Applying rule 1 (line 5): ^a<ij>/a<ij>$

Applying output rule 0 (line 5): a<IJ> -> ^a<ij>/a<ij>$

No rule specified: ^MATCHED<a>a$
^MATCHED<a>$
Applying rule 1 (line 5): ^æ<ij>/æ<ij>$

Applying output rule 0 (line 5): æ<IJ> -> ^æ<ij>/æ<ij>$

No rule specified: ^NO_MATCH<x>æ$
^NO_MATCH<x>$

Is there some special trick?

$ cat bar.rtx
ij: _;
IJ: _;

IJ ->
    ij
    ?(1.lemh/sl = "æ" )
    { MATCHED@x }
    ;
$ rtx-comp bar.rtx bar.bin && echo '^æ<ij>/æ<ij>$' | rtx-proc -s  bar.bin
int 1
pushinput
string
 -> lemh
sourceclip
 -> æ
dup
string
 ->
equal
 -> false
jumponfalse
 -> false, jumping
string
 -> æ
equal
 -> false
jumpontrue
 -> false
rejectrule
^æ<ij>$

There's our æ

$ echo 'å:_; å -> å{1};' > x; rtx-comp -s x b
å -> å

so everything is misdecoded

15d4cf2 seems to have fixed it