WorksApplications/SudachiDict

Normalization of すみません and すいません differs

rsimmons opened this issue · 1 comments

Is this the right place to report linguistic issues with the dictionaries? Apologies if not.

Using Sudachi 0.4.3, the core dictionary version 20200722, and mode C, I noticed that すみません and すいません do not normalize to the same verb, and it seems like they should.

For すいません, the normalized verb is 済む, which seems correct:

すい	動詞,一般,*,*,五段-マ行,連用形-イ音便	済む
ませ	助動詞,*,*,*,助動詞-マス,未然形-一般	ます
ん	助動詞,*,*,*,助動詞-ヌ,終止形-撥音便	ず

For すみません, the normalized verb is すむ. It seems like it should be 済む also?

すみ	動詞,一般,*,*,五段-マ行,連用形-一般	すむ
ませ	助動詞,*,*,*,助動詞-マス,未然形-一般	ます
ん	助動詞,*,*,*,助動詞-ヌ,終止形-撥音便	ず

Thank you for asking about Sudachi Normalization.

すみません could be 済みません , 住みません or 澄みません.
Sudachi does not normalize a word to any particular one if there is a possibility of other words.

すみ	動詞,一般,*,*,五段-マ行,連用形-一般	すむ
ませ	助動詞,*,*,*,助動詞-マス,未然形-一般	ます
ん	助動詞,*,*,*,助動詞-ヌ,終止形-撥音便	ず

Therefore, すみ(動詞,一般,,,五段-マ行,連用形-一般)is normalized to すむ is the correct behavior.

In the same way, すい(動詞,一般,,,五段-マ行,連用形-イ音便) should be normalized to すむ.

すい	動詞,一般,*,*,五段-マ行,連用形-イ音便	済む
ませ	助動詞,*,*,*,助動詞-マス,未然形-一般	ます
ん	助動詞,*,*,*,助動詞-ヌ,終止形-撥音便	ず

We will fix it in the next update.