jieba analysis plugin for elasticsearch: 7.13.0, 7.4.2, 7.3.0, 7.0.0, 6.4.0, 6.0.0, 5.4.0, 5.3.0, 5.2.2, 5.2.1, 5.2.0, 5.1.2, 5.1.1
- 支持动态添加字典,不重启ES。
6.4.1的release,解决了PositionIncrement问题。详细说明见ES分词PositionIncrement解析
分支 | tag | elasticsearch版本 | Release Link |
---|---|---|---|
7.13.0 | tag v7.7.1 | v7.13.0 | Download: v7.13.0 |
7.4.2 | tag v7.4.2 | v7.4.2 | Download: v7.4.2 |
7.3.0 | tag v7.3.0 | v7.3.0 | Download: v7.3.0 |
7.0.0 | tag v7.0.0 | v7.0.0 | Download: v7.0.0 |
6.4.0 | tag v6.4.1 | v6.4.0 | Download: v6.4.1 |
6.4.0 | tag v6.4.0 | v6.4.0 | Download: v6.4.0 |
6.0.0 | tag v6.0.0 | v6.0.0 | Download: v6.0.1 |
5.4.0 | tag v5.4.0 | v5.4.0 | Download: v5.4.0 |
5.3.0 | tag v5.3.0 | v5.3.0 | Download: v5.3.0 |
5.2.2 | tag v5.2.2 | v5.2.2 | Download: v5.2.2 |
5.2.1 | tag v5.2.1 | v5.2.1 | Download: v5.2.1 |
5.2 | tag v5.2.0 | v5.2.0 | Download: v5.2.0 |
5.1.2 | tag v5.1.2 | v5.1.2 | Download: v5.1.2 |
5.1.1 | tag v5.1.1 | v5.1.1 | Download: v5.1.1 |
- choose right version source code.
- run
git clone https://github.com/sing1ee/elasticsearch-jieba-plugin.git --recursive
./gradlew clean pz
- copy the zip file to plugin directory
cp build/distributions/elasticsearch-jieba-plugin-5.1.2.zip ${path.home}/plugins
- unzip and rm zip file
unzip elasticsearch-jieba-plugin-5.1.2.zip
rm elasticsearch-jieba-plugin-5.1.2.zip
- start elasticsearch
./bin/elasticsearch
Just put you dict file with suffix .dict into ${path.home}/plugins/jieba/dic. Your dict file should like this:
小清新 3
百搭 3
显瘦 3
隨身碟 100
your_word word_freq
- find stopwords.txt in ${path.home}/plugins/jieba/dic.
- create folder named stopwords under ${path.home}/config
mkdir -p {path.home}/config/stopwords
- copy stopwords.txt into the folder just created
cp ${path.home}/plugins/jieba/dic/stopwords.txt {path.home}/config/stopwords
- create index:
PUT http://localhost:9200/jieba_index
{
"settings": {
"analysis": {
"filter": {
"jieba_stop": {
"type": "stop",
"stopwords_path": "stopwords/stopwords.txt"
},
"jieba_synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
}
},
"analyzer": {
"my_ana": {
"tokenizer": "jieba_index",
"filter": [
"lowercase",
"jieba_stop",
"jieba_synonym"
]
}
}
}
}
}
- test analyzer:
PUT http://localhost:9200/jieba_index/_analyze
{
"analyzer" : "my_ana",
"text" : "黄河之水天上来"
}
Response as follow:
{
"tokens": [
{
"token": "黄河",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "黄河之水天上来",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "之水",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "天上",
"start_offset": 4,
"end_offset": 6,
"type": "word",
"position": 2
},
{
"token": "上来",
"start_offset": 5,
"end_offset": 7,
"type": "word",
"position": 2
}
]
}
migrate from jieba-solr
I will add more analyzer support:
- stanford chinese analyzer
- fudan nlp analyzer
- ...
If you have some ideas, you should create an issue. Then, we will do it together.