How to match non-ascii lemmas?
Closed this issue · 4 comments
unhammer commented
$ cat foo.rtx
ij: _;
IJ: _;
IJ ->
%ij
(
if (1.lemh/sl = "a" )
{ MATCHED@a.[1.lemh/sl] }
el-if (1.lemh/sl = "æ" )
{ MATCHED@æ.[1.lemh/sl] }
else
{ NO_MATCH@x.[1.lemh/sl] }
)
;
$ rtx-comp foo.rtx foo.bin && echo '^a<ij>/a<ij>$ ^æ<ij>/æ<ij>$' | rtx-proc -r foo.bin
Applying rule 1 (line 5): ^a<ij>/a<ij>$
Applying output rule 0 (line 5): a<IJ> -> ^a<ij>/a<ij>$
No rule specified: ^MATCHED<a>a$
^MATCHED<a>$
Applying rule 1 (line 5): ^æ<ij>/æ<ij>$
Applying output rule 0 (line 5): æ<IJ> -> ^æ<ij>/æ<ij>$
No rule specified: ^NO_MATCH<x>æ$
^NO_MATCH<x>$
unhammer commented
Is there some special trick?
unhammer commented
$ cat bar.rtx
ij: _;
IJ: _;
IJ ->
ij
?(1.lemh/sl = "æ" )
{ MATCHED@x }
;
$ rtx-comp bar.rtx bar.bin && echo '^æ<ij>/æ<ij>$' | rtx-proc -s bar.bin
int 1
pushinput
string
-> lemh
sourceclip
-> æ
dup
string
->
equal
-> false
jumponfalse
-> false, jumping
string
-> æ
equal
-> false
jumpontrue
-> false
rejectrule
^æ<ij>$
There's our æ
unhammer commented
$ echo 'å:_; å -> å{1};' > x; rtx-comp -s x b
å -> å
so everything is misdecoded