怎么在分词后保留"c++软件工程师"中“+”号在结果中,为什么拼音分词器会过滤掉符号呢
Maskvvv opened this issue · 0 comments
Maskvvv commented
GET /_analyze
{
"tokenizer": "keyword",
"filter": [
{
"type": "pinyin",
"keep_original": false,
"keep_first_letter": false,
"keep_full_pinyin": true,
"none_chinese_pinyin_tokeniz": true,
"ignore_pinyin_offset": false
}
],
"text": [
"c++软件工程师"
]
}
结果
{
"tokens" : [
{
"token" : "c",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 0
},
{
"token" : "c",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 1
},
{
"token" : "ruan",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 2
},
{
"token" : "jian",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 3
},
{
"token" : "gong",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 4
},
{
"token" : "cheng",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 5
},
{
"token" : "shi",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 6
}
]
}