kiminoa/inflection-finder

prefixes like A- are getting lost due to clustering mechanism

Closed this issue · 4 comments

I should be seeing the following in the HT LA tablets inflection candidates, but am not. Need to investigate why.

A- with evidence from:
A-PA-RA-NE,PA-RA-NE
SA-RA2,A-SA-RA2
KA-RU,A-KA-RU

A- is getting dropped with A-KA-RU as its only instance. However, the A-PA-RA-NE,PA-RA-NE should lend to this ... does it never get clusted into its own group? Appears with PA-RA and A-RA-NA-RE which may be why it's getting shunned. Hm.

May just need to tune the clustering algorithm some more. Investigating ...

A-SA-RA2, SA-RA2, SA-RA-RA

So maybe it's worth retaining matches when only some members of a cluster exhibit the pattern. Hm.

Fixed by including pairs from clustering in processing; better approach anyway because it allows us to treat the clustering more like a guideline and less like a holywrit.