No reading form for certain words
sorami opened this issue · 0 comments
sorami commented
>>> from sudachipy import tokenizer, dictionary
>>> tokenizer_obj = dictionary.Dictionary().create()
>>> [m.reading_form() for m in tokenizer_obj.tokenize("コンピュータ")]
['']
>>> [m.reading_form() for m in tokenizer_obj.tokenize("計算機")]
['ケイサンキ']
It should show the surface
when the reading_form
does not exist in the lexicon.
e.g., In the original Java implementation - dictionary/WordInfoList.java
;
WordInfo getWordInfo(int wordId) {
...
String readingForm = bufferToString(buf);
if (readingForm.isEmpty()) {
readingForm = surface;
}
...
}
Thanks sig_m on the slack channel for reporting this!