apertium/apertium-separable

Doesn't work for simple example

Closed this issue · 10 comments

<dictionary type="sequential">
<sdefs>
<sdef n="det"/>
<sdef n="abl"/>
<sdef n="dem"/>
<sdef n="n"/>
<sdef n="cnjadv"/>
</sdefs>
<section id="main" type="standard">
<e><p><l>bu<s n="det"/><s n="dem"/><j/>yüz<s n="n"/><s n="abl"/></l>
      <r>bu<b/>yüzden<s n="cnjadv"/></r></p></e>
</section>
</dictionary>

Then compile:

$ lsx-comp lr apertium-tur-uzb.tur-uzb.lsx tur-uzb.autosep.bin
main@standard 11 10

Show the transducer:

$ lt-print tur-uzb.autosep.bin
0	1	b	b	0.000000	
1	2	u	u	0.000000	
2	3	<det>	 	0.000000	
3	4	<dem>	y	0.000000	
4	5	<$>	ü	0.000000	
5	6	y	z	0.000000	
6	7	ü	d	0.000000	
7	8	z	e	0.000000	
8	9	<n>	n	0.000000	
9	10	<abl>	<cnjadv>	0.000000	
10	0.000000

But it doesn't work:

$ echo "^bu<det><dem>$ ^yüz<n><abl>$" | lsx-proc tur-uzb.autosep.bin 
^bu<det><dem>$ ^yüz<n><abl>$

Expected output is:

^bu yüzden<cnjadv>$

@jonorthwash @itang1 @unhammer any ideas?

This is exactly the sort of problem we were having with apertium/apertium-eng-deu#4. We never really reported it officially.

I hope this issue solves

You might need <t> and/or <g> somewhere. See examples at #2 (comment).

It shouldn't need <t> because the tags are fixed. is only if there is # right? Or does it have another meaning?

@xavivars, @hectoralos, any thoughts about what @ftyers is doing wrong here?

I don't really know much about apertium-separable, on top of having fixed the null-flushing (I hope!). But honestly, now very little about the format.

On the issue you link to, I think I just played with Hector's rule until it worked...

Unfortunately, so is my knowledge of the module (which needs a better documentation). I just multiplied some constructions Fran wrote in fra-cat. I've been comparing what it is used in apertium-fra-cat and what is in this example, and in fra-cat there are a couple of <t/>. I've been trying if adding <t/> will help, but I couldn't match ^bu<det><dem>$ ^yüz<n><abl>$.

@ftyers, try the code I committed in apertium/apertium-tur-uzb@5081938. It works for me now.

Basically I just added <j/>.

Great, that should definitely go in the documentation, or alternatively the compiler should be updated to automatically add <j/> at the end of every entry.

I would file a new issue about not working without <j/>, suggesting that solution.