youxiangongsi 分词异常
Opened this issue · 2 comments
zhmfan commented
{
"token": "you",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "xiang",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "o",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "n",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 3
},
{
"token": "g",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 4
},
{
"token": "si",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 5
}
renpengben commented
我也遇到这个问题 周星驰简拼 zxc 被分词成一对单个字符。
shiwl0329 commented
我也遇到了。假设拼音特意采用空格分隔,如:ying lun mi an,通过拼音分词能分成ying lun mi an,而不是现在的ying lun mian把mi和an黏在了一块