infinilabs/analysis-pinyin

汉字转拼音时,避免拼音被拆分为多个token不生效

idawwei opened this issue · 0 comments

Description

测试123EDF,避免拼音拆分多个token,期望效果“ceshi123EDF”

A description of what the bug is.
出现问题:数字被拆分,EDF被拆分,拆分成ce,shi

Steps to reproduce

索引设置:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin_tokenizer"
}
},
"tokenizer": {
"my_pinyin_tokenizer": {
"type": "pinyin",
"keep_first_letter": false,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"limit_first_letter_length": 16,
"lowercase": true,
"none_chinese_pinyin_tokenize": true
}
}
}
}
}

分词测试:
GET /my_index/_analyze
{
"analyzer": "pinyin_analyzer",
"text": "理财123EDF"
}

Environment

  • Versions: [e.g. Elasticsearch 7.16.2]
  • analysis-pinyin 7.16.2