/elasticsearch-analysis-vi

Elasticsearch Vietnamese Analysis Plugin

Primary LanguageJavaGNU Lesser General Public License v3.0LGPL-3.0

Vietnamese Analysis Plugin for Elasticsearch

Vietnamese Analysis plugin integrates Vietnamese language analysis into Elasticsearch. The plugin provides the following functions:

Analyzer: vi_analyzer. Tokenizer: vi_tokenizer. Filter: vi_stop. The vi_analyzer itself is composed of the vi_tokenizer and the vi_stop filter.

The tokenizer uses coccoc-tokenizer for tokenization.

Installation

Choose a version from the releases page to install:

elasticsearch-plugin install https://github.com/sun-asterisk-research/elasticsearch-analysis-vi/releases/download/<release>/<bundle>

Or build from source and install from a plugin bundle.

elasticsearch-plugin instal file:///path/to/plugin

Supported versions

Branch Elasticsearch version
master 7.4+
7.3 7.0 - 7.3

Build from source

You need the following build dependencies: JDK, make, cmake, libstdc++. At least JDK 11 is required. Beware of your libstdc++ version. If you build on a version too new, it will not work on older systems.

First update the git submodules:

git submodule update --init

Build and bundle the plugin:

./gradlew assemble

To build for a different elasticsearch version, add -PelasticsearchVersion=<version> to your build command. Also note the branch and supported versions. For example, to build for Elasticsearch 7.3.1:

./gradlew assemble -PelasticsearchVersion=7.3.1

To run tests:

./gradlew check