messense/jieba-rs

Fix cut_all mixed chinese & english issue

messense opened this issue · 2 comments

The same as the fix of the Python version: fxsjy/jieba@97c3246

MnO2 commented

@messense : Code mixing is a hard problem, it's about where would you draw the the boundary of Chinese vocabulary. Not only English alphabet could be used in the product names, but Japanese hiragana as well like . I would argue this is beyond the scope a Chinese segmenter, but for sure we can apply the work-around like the one in python implementation for practical reasons.