prefixes like A- are getting lost due to clustering mechanism
Closed this issue · 4 comments
I should be seeing the following in the HT LA tablets inflection candidates, but am not. Need to investigate why.
A- with evidence from:
A-PA-RA-NE,PA-RA-NE
SA-RA2,A-SA-RA2
KA-RU,A-KA-RU
A- is getting dropped with A-KA-RU as its only instance. However, the A-PA-RA-NE,PA-RA-NE should lend to this ... does it never get clusted into its own group? Appears with PA-RA and A-RA-NA-RE which may be why it's getting shunned. Hm.
May just need to tune the clustering algorithm some more. Investigating ...
A-SA-RA2, SA-RA2, SA-RA-RA
So maybe it's worth retaining matches when only some members of a cluster exhibit the pattern. Hm.
Fixed by including pairs from clustering in processing; better approach anyway because it allows us to treat the clustering more like a guideline and less like a holywrit.