汉字转拼音时,避免拼音被拆分为多个token不生效
idawwei opened this issue · 0 comments
idawwei commented
Description
测试123EDF,避免拼音拆分多个token,期望效果“ceshi123EDF”
A description of what the bug is.
出现问题:数字被拆分,EDF被拆分,拆分成ce,shi
Steps to reproduce
索引设置:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin_tokenizer"
}
},
"tokenizer": {
"my_pinyin_tokenizer": {
"type": "pinyin",
"keep_first_letter": false,
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"limit_first_letter_length": 16,
"lowercase": true,
"none_chinese_pinyin_tokenize": true
}
}
}
}
}
分词测试:
GET /my_index/_analyze
{
"analyzer": "pinyin_analyzer",
"text": "理财123EDF"
}
Environment
- Versions: [e.g. Elasticsearch 7.16.2]
- analysis-pinyin 7.16.2