Fix cut_all mixed chinese & english issue
messense opened this issue · 2 comments
messense commented
The same as the fix of the Python version: fxsjy/jieba@97c3246
MnO2 commented
@messense : Code mixing is a hard problem, it's about where would you draw the the boundary of Chinese vocabulary. Not only English alphabet could be used in the product names, but Japanese hiragana as well like の
. I would argue this is beyond the scope a Chinese segmenter, but for sure we can apply the work-around like the one in python implementation for practical reasons.