Normalization of すみません and すいません differs
rsimmons opened this issue · 1 comments
Is this the right place to report linguistic issues with the dictionaries? Apologies if not.
Using Sudachi 0.4.3, the core dictionary version 20200722, and mode C, I noticed that すみません and すいません do not normalize to the same verb, and it seems like they should.
For すいません, the normalized verb is 済む, which seems correct:
すい 動詞,一般,*,*,五段-マ行,連用形-イ音便 済む
ませ 助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん 助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず
For すみません, the normalized verb is すむ. It seems like it should be 済む also?
すみ 動詞,一般,*,*,五段-マ行,連用形-一般 すむ
ませ 助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん 助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず
Thank you for asking about Sudachi Normalization.
すみません could be 済みません , 住みません or 澄みません.
Sudachi does not normalize a word to any particular one if there is a possibility of other words.
すみ 動詞,一般,*,*,五段-マ行,連用形-一般 すむ
ませ 助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん 助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず
Therefore, すみ(動詞,一般,,,五段-マ行,連用形-一般)is normalized to すむ is the correct behavior.
In the same way, すい(動詞,一般,,,五段-マ行,連用形-イ音便) should be normalized to すむ.
すい 動詞,一般,*,*,五段-マ行,連用形-イ音便 済む
ませ 助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん 助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず
We will fix it in the next update.