apertium/apertium-separable

Issue with capital letters

Closed this issue · 5 comments

This works:

        <e lm="Jun Ajpu" c=""><p>
                <l>jun<s n="num"/><j/>ajpu<s n="np"/><s n="ant"/><s n="m"/><j/></l>
                <r>Jun<b/>Ajpu<s n="np"/><s n="ant"/><s n="m"/></r>
            </p>
        </e>
$ echo "^Jun<num>$ ^Ajpu<np><ant><m>$ " | lsx-proc quc-spa.autoseq.bin 
^Jun Ajpu<np><ant><m>$ 

But this doesn't:

        <e lm="Jun Ajpu" c=""><p>
                <l>Jun<s n="num"/><j/>Ajpu<s n="np"/><s n="ant"/><s n="m"/><j/></l>
                <r>Jun<b/>Ajpu<s n="np"/><s n="ant"/><s n="m"/></r>
            </p>
        </e>
$ echo "^Jun<num>$ ^Ajpu<np><ant><m>$ " | lsx-proc quc-spa.autoseq.bin 
^Jun<num>$ ^Ajpu<np><ant><m>$ 

This looks like it's an issue in lttoolbox/fst_processor.cc:

        else if(val > 0)
        {
          int val_lowercase = towlower(val);
          s.step_override(val_lowercase, alphabet(L"<ANY_CHAR>"), val); // FIXME deal with cases! in step_override
        }

Oops, I noted how to fix this issue and Github proceeded to close it.

If apertium/lttoolbox#100 gets merged then in this

int val = towlower(lu[i]);
if(lu[i] == L'\\')
{
i++;
val = lu[i];
}
s.step_override(val, any_char, lu[i]);

we just need to change s.step_override(val, any_char, lu[i]); to s.step_override(lu[i], val, any_char, lu[i]); and it should work, I think.

@ftyers as of a641a6b, your example works (in fact it's the test case) though now it won't match ^jun<num>$ ^ajpu<np><ant><m>$ (lowercase) - do you foresee that being a problem?

Closing this, since the behavior is now consistent with lt-proc.