infinilabs/analysis-pinyin

严重BUG:当分词内容中包含单独的A字母时,这个A字母会被分词器扔掉

Opened this issue · 1 comments

GET /_analyze
{
"analyzer" : "ik_smart",
"text" : "我们A A制"
}
{
"tokens": [
{
"token": "我们",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "制",
"start_offset": 5,
"end_offset": 6,
"type": "CN_CHAR",
"position": 1
}
]
}

ik默认会加载一个停用词典stopword.dic,里面包含字母'a'(在英文中被认为是停用词),所以会被过滤掉,把ik目录下/config/stopword.dic清空就可以了