coli-saar/alto

EM yields incorrect results

alexanderkoller opened this issue · 2 comments

Three tests regarding EM fail with the current version of Alto. @jgroschwitz could you check if this is caused by your recent refactoring of EM and fix it?

  <testcase name="testEmCD" classname="de.up.ling.irtg.InterpretedTreeAutomatonTest" time="0.018">
    <failure message="java.lang.AssertionError: expected 0.4285714286, got 0.5" type="java.lang.AssertionError">java.lang.AssertionError: expected 0.4285714286, got 0.5

  <testcase name="testEmEF" classname="de.up.ling.irtg.InterpretedTreeAutomatonTest" time="0.022">
    <failure message="java.lang.AssertionError: expected 0.3, got 0.25" type="java.lang.AssertionError">java.lang.AssertionError: expected 0.3, got 0.25

  <testcase name="testEmAbab" classname="de.up.ling.irtg.InterpretedTreeAutomatonTest" time="0.005">
    <failure message="java.lang.AssertionError: expected 0.5, got 0.3333333333333333" type="java.lang.AssertionError">java.lang.AssertionError: expected 0.5, got 0.3333333333333333

I'll have a look

I just fixed it, was a silly thing. The EM implementation stops after a fixed amount of iterations or after likelihood improvements are under a threshold. The default implementation specifies 0 iterations with the intention that only the threshold should count. I had changed the underlying function such that the number of iterations must be negative to have that effect; i.e. it is now possible to run actually 0 iterations of EM. But I hadn't changed the default implementation, so all the tests ran literally 0 iterations of EM. I now changed the default implementation to use iterations=-1 and clarified the documentation. Sorry I left this around so long!