Issue

Can't use custom dictionary in JDK9. So change targetCompatibility to 1.8.

All published releases had build on JDK9.

Build and Install

Install lib

gradle mvn

Import HanLP data

Download HanLP data.See here HanLP Releases
Modify the data root in config, change the ${data.root} to your own HanLP root data dir

Modify Plugin Security Policy

Modify ${elasticsearchHome}/config/jvm.options add this in the end

-Djava.security.policy=file://${elasticsearchHome}/plugins/analysis-hanlp/plugin-security.policy

Index and Highlight

Support two kind analyzer:

HanLPAnalyzer standard analyzer, alias hanlp
HanLPIndexAnalyzer index analyzer, alias hanlp-index

Test Analyzer

GET /_analyze

{
  "analyzer" : "hanlp-index",
  "text": ["中华人民共和国","地大物博"]
}

Response is:

{
  "tokens": [
    {
      "token": "中华人民共和国",
      "start_offset": 0,
      "end_offset": 7,
      "type": "ns",
      "position": 0
    },
    {
      "token": "中华人民",
      "start_offset": 0,
      "end_offset": 4,
      "type": "nz",
      "position": 1
    },
    {
      "token": "中华",
      "start_offset": 0,
      "end_offset": 2,
      "type": "nz",
      "position": 2
    },
    {
      "token": "华人",
      "start_offset": 1,
      "end_offset": 3,
      "type": "n",
      "position": 3
    },
    {
      "token": "人民共和国",
      "start_offset": 2,
      "end_offset": 7,
      "type": "nz",
      "position": 4
    },
    {
      "token": "人民",
      "start_offset": 2,
      "end_offset": 4,
      "type": "n",
      "position": 5
    },
    {
      "token": "共和国",
      "start_offset": 4,
      "end_offset": 7,
      "type": "n",
      "position": 6
    },
    {
      "token": "共和",
      "start_offset": 4,
      "end_offset": 6,
      "type": "n",
      "position": 7
    },
    {
      "token": "地大物博",
      "start_offset": 8,
      "end_offset": 12,
      "type": "nz",
      "position": 8
    },
    {
      "token": "地大",
      "start_offset": 8,
      "end_offset": 10,
      "type": "nz",
      "position": 9
    }
  ]
}

Mapping

PUT test/_mapping/test

{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "hanlp-index",
      "search_analyzer": "hanlp-index",
      "index_options": "offsets"
    }
  }
}

Index Document

PUT /test/test/1

{
  "content": ["中华人民共和国","地大物博"]
}

Highlight

POST /test/test/_search

{
  "query": {
    "match": {
      "content": "中华"
    }
  },
  "highlight": {
    "pre_tags": [
      "<tag1>"
    ],
    "post_tags": [
      "</tag1>"
    ],
    "fields": {
      "content": {}
    }
  }
}