duydo/elasticsearch-analysis-vietnamese

_analyze not working with vietnamese

Closed this issue · 2 comments

Docker: v4.18.0
Elasticsearch: 8.7.0
Vietnamese Analysis Plugin: master branch

curl -X PUT "elastic:123456@localhost:9200/my-vi-index-00002" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"analyzer": {
"my_vi_analyzer": {
"tokenizer": "vi_tokenizer",
"filter": [
"lowercase",
"ascii_folding"
]
}
},
"filter": {
"ascii_folding": {
"type": "asciifolding",
"preserve_original": true
}
}
}
}
}
'

$ curl -X GET "elastic:123456@localhost:9200/my-vi-index-00002/_analyze" -H 'Content-Type: application/json' -d'
{
"analyzer": "my_vi_analyzer",
"text": "Cộng hòa Xã hội chủ nghĩa Việt Nam"
}
'

{"error":{"root_cause":[{"type":"x_content_parse_exception","reason":"[4:11] [analyze_request] failed to parse field [text]"}],"type":"x_content_parse_exception","reason":"[4:11] [analyze_request] failed to parse field [text]","caused_by":{"type":"json_parse_exception","reason":"Invalid UTF-8 middle byte 0x61\n at [Source: (org.elasticsearch.common.io.stream.ByteBufferStreamInput); line: 4, column: 20]"}},"status":400}

duydo commented

@letranhuudanh0030 I couldn't reproduce the error you mentioned.
My setup:

  • Docker 4.19.0
  • ES v8.7
  • Plugin: master branch (8.7)
Screen Shot 2023-07-13 at 07 32 12

Thanks @duydo, It sounds like the terminal window is encoding data in some illegal encoding. it works with devtool of kibana