tshatrov/ichiran

一箇年 and 堪へる are missing kana_text, causing internal server error

AnyhowStep opened this issue · 2 comments

Links:

And I'm pretty sure the server is otherwise OK because other searches work fine,

I looked at the DB dump in the latest release and noticed those two words are missing kana_text rows.
So, there was probably a bug with parsing JMDict.

The entry content clearly shows that there should be kana.

一箇年 (いっかねん)

<?xml version="1.0" encoding="UTF-8"?>\n
<entry>\n
	<ent_seq>1161240</ent_seq>\n
	<k_ele>\n
		<keb>一箇年</keb>\n
	</k_ele>\n
	<r_ele>\n
		<reb>いっかねん</reb>\n
		<re_inf>ok</re_inf>\n
	</r_ele>\n
	<sense>\n
		<pos>n</pos>\n
		<gloss xml:lang="eng">one year</gloss>\n
	</sense>\n
</entry>

堪へる (たへる)

<?xml version="1.0" encoding="UTF-8"?>\n
<entry>\n
	<ent_seq>2209300</ent_seq>\n
	<k_ele>\n
		<keb>堪へる</keb>\n
	</k_ele>\n
	<r_ele>\n
		<reb>たへる</reb>\n
		<re_inf>ok</re_inf>\n
	</r_ele>\n
	<sense>\n
		<pos>v1</pos>\n
		<pos>vi</pos>\n
		<pos>vt</pos>\n
		<xref>堪える・1</xref>\n
		<gloss xml:lang="eng">to bear</gloss>\n
		<gloss xml:lang="eng">to stand</gloss>\n
		<gloss xml:lang="eng">to endure</gloss>\n
		<gloss xml:lang="eng">to put up with</gloss>\n
	</sense>\n
</entry>

There might be other inconsistencies in the database (like entry.n_kana, entry.n_kanji, kana_text.nokanji, etc. being wrong) but I didn't check.

Thanks for spotting this. It seems in JMdict the only kana spelling of these words is tagged with [ok] which means "outdated or obsolete kana usage", which gets filtered out by ichiran. I'll add these spellings manually I guess.

fixed