WorksApplications/elasticsearch-sudachi

invalid system dictionary on es 6.5.4

yanghanxy opened this issue · 6 comments

Hi,

I meet with the following error when I try sudachi with es,

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[iCasELc][127.0.0.1:9300][indices:admin/analyze[s]]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "invalid system dictionary"
  },
  "status": 400
}

Here's some background info:
es version: elasticsearch-6.5.4
sudachi version: v6.5.4-1.3.1-SNAPSHOT
dict version: sudachi-dictionary-20201223-core
dict path: $ES_HOME/config/sudachi_tokenizer/system_core.dic
es command:

PUT sudachi_sample
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "sudachi_tokenizer": {
            "type": "sudachi_tokenizer"
          }
        },
        "analyzer": {
          "sudachi_analyzer": {
            "tokenizer": "sudachi_tokenizer",
            "type": "custom"
          }
        }
      }
    }
  }
}
GET sudachi_sample/_analyze
{
    "analyzer": "sudachi_analyzer",
    "text": "関西国際空港"
}

I try with previous version dictionary, and it works.
dict version: sudachi-dictionary-20191224

Not sure if there's some compatibility issue between sudachi-es and latest dict.

Hi, thank you for your report.
I reproduced it. We will investigate the cause.

Sudachi-0.2.0 included in v6.5.4-1.3.1-SNAPSHOT is old and does not support recent dictionaries.
You can use the es6.8 branch and make a plugin as follows. (probably)

./gradlew -PelasticsearchVersion=6.5.4 build

@kazuma-t Thanks for your reply,
I tried to build the plugin with your command, but ES take it as "unknown plugin".
Do you have any suggestions for what is missing?

(base) ➜  elasticsearch-sudachi-es6.8 ./gradlew -PelasticsearchVersion=6.5.4 build

> Task :buildTestDict
reading the source file... 39 words
writing the POS table... 266 bytes
writing the connection matrix... 204 bytes
building the trie.........done
writing the trie... 3,076 bytes
writing the word-ID table... 198 bytes
writing the word parameters... 238 bytes
writing the wordInfos... 2,611 bytes
writing wordInfo offsets... 156 bytes

BUILD SUCCESSFUL in 5s
10 actionable tasks: 2 executed, 8 up-to-date
(base) ➜  elasticsearch-sudachi-es6.8 cd ~/workspace/elasticsearch-6.5.4/bin
(base) ➜  bin ./elasticsearch-plugin install ~/workspace/elasticsearch-sudachi-es6.8/build/distributions/analysis-sudachi-6.5.4-2.1.0.zip
ERROR: Unknown plugin /Users/liangzhi/workspace/elasticsearch-sudachi-es6.8/build/distributions/analysis-sudachi-6.5.4-2.1.0.zip

elasticsearch-plugin specifies the plugin by URL, as shown below.

bin/elasticsearch-plugin install file:///Users/liangzhi/workspace/elasticsearch-sudachi-es6.8/build/distributions/analysis-sudachi-6.5.4-2.1.0.zip

elasticsearch-plugin specifies the plugin by URL, as shown below.

bin/elasticsearch-plugin install file:///Users/liangzhi/workspace/elasticsearch-sudachi-es6.8/build/distributions/analysis-sudachi-6.5.4-2.1.0.zip

Haha, yes I forget the file://
It works now, problem solved. Thank you!