Solr VN Tokenizer

A Vietnamese tokenizer/analyzer plugin for Solr 8.5.2

The plugin supports VietnameseAnalyzer and VietnameseTokenizer.

VietnameseAnalyzer combines VietnameseTokenizer, LowerCaseFilter and StopFilter.

Install

Build the plugin

git clone https://github.com/tuan-nng/solr-vn-tokenizer
cd solr-vn-tokenizer
mvn package

Install plugin

Copy following jar files into solr server directory: <solr dir>/server/lib

solr-vn-analyzer-1.0.jar
VnCoreNLP-1.1.1.jar
commons-io.2.7.jar
all jars in target/lib which contain jaxb prefix in name
activation-1.1.1.jar in target/lib

Create new type for Vietnamese text

Assuming that you already created an index named test. Use this command to create a new field type for Vietnamese text

curl -X POST \
  http://localhost:8983/solr/test/schema \
  -d '{
  "add-field-type": {
    "name": "text_vn",
    "class": "solr.TextField",
    "indexAnalyzer": {
        "class": "org.apache.lucene.analysis.vi.VietnameseAnalyzer"
    },
    "queryAnalyzer": {
        "class": "org.apache.lucene.analysis.vi.VietnameseAnalyzer"
    }
  }
}'

Above command will create a new fieldType in managed-schema file as below.

<fieldType name="text_vn" class="solr.TextField">
    <analyzer type="index" class="org.apache.lucene.analysis.vi.VietnameseAnalyzer"/>
    <analyzer type="query" class="org.apache.lucene.analysis.vi.VietnameseAnalyzer"/>
</fieldType>

anhdowastaken/solr-vn-tokenizer

Solr VN Tokenizer

Install