hankcs/HanLP

中文分词(粗分)错误:New in version 3.3.

wencan opened this issue · 1 comments

wencan commented

Describe the bug
文本:

New in version 3.3.

https://hanlp.hankcs.com/demos/tok.html?text=New+in+version+3.3.%0A%0A&coarse=true

结尾的.是一个句号。但粗分把 3.3. 放一起了。细分没这问题

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Describe the current behavior
A clear and concise description of what happened.

Expected behavior
A clear and concise description of what you expected to happen.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python version: 无
  • HanLP version: 线上最新

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

  • I've completed this form and searched the web for solutions.
hankcs commented

这是英文分词的范畴而不是中文分词的bug。建议使用英文模型,或自定义辞典。