首字母搜索,mec不能搜索木耳草
vancefantasy opened this issue · 3 comments
vancefantasy commented
索引配置
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin"
}
}
"tokenizer": {
"my_pinyin": {
"lowercase": "true",
"keep_original": "false",
"keep_first_letter": "true",
"keep_separate_first_letter": "true",
"type": "pinyin",
"limit_first_letter_length": "64",
"keep_full_pinyin": "true"
}
"properties": {
"name": {
"type": "keyword",
"py": {
"type": "text",
"analyzer": "pinyin_analyzer",
"search_analyzer": "pinyin_analyzer"
}
}
}
index time(木耳草)
{
"tokens": [
{
"token": "m",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "mu",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "e",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "er",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "c",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "cao",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "mec",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
}
]}
search time (mec)
{
"tokens": [
{
"token": "me",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "c",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "mec",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
}
] }
搜索时mec分词结果中包含me,使用phrase query检索时,检索不出来。有没有解决方案??
medcl commented
pinyin 如果产生多个重复的位置重叠的 term,本来就不适合 phrase 查询。换普通的查询应该是可以的,查询和索引都有分出 term:mec,应该可以查询出来的,
vancefantasy commented
@medcl
感谢回复。
使用best_fields替换phrase后,命中范围有点大,一些不相干的结果都出来了
如果指定search 的analyzer为keyword_analyzer,可以搜出来,解决了当前场景的问题,但是会引入其他问题,例如搜muer就不行了,有点难搞哦
yanjiali2020 commented
我用示例里的medcl3,
POST /medcl3/_doc/lucy {"name":"敏感的心"}
发现搜索mingan,会搜出ming/an, min/gan都不到;但是分词里是有min, gan,搜索mg是可以的
这个怎么解决
GET /medcl3/_validate/query?explain { "query": {"match": { "name.pinyin": "mingan" }} }