A Vietnamese tokenizer/analyzer plugin for Solr 8.5.2
The plugin supports VietnameseAnalyzer
and VietnameseTokenizer
.
VietnameseAnalyzer
combinesVietnameseTokenizer
,LowerCaseFilter
andStopFilter
.
- Build the plugin
git clone https://github.com/tuan-nng/solr-vn-tokenizer
cd solr-vn-tokenizer
mvn package
- Install plugin
Copy following jar files into solr server directory: <solr dir>/server/lib
- solr-vn-analyzer-1.0.jar
- VnCoreNLP-1.1.1.jar
- commons-io.2.7.jar
- all jars in
target/lib
which containjaxb
prefix in name - activation-1.1.1.jar in
target/lib
- Create new type for Vietnamese text
Assuming that you already created an index named test
. Use this command to create a new field type for Vietnamese text
curl -X POST \
http://localhost:8983/solr/test/schema \
-d '{
"add-field-type": {
"name": "text_vn",
"class": "solr.TextField",
"indexAnalyzer": {
"class": "org.apache.lucene.analysis.vi.VietnameseAnalyzer"
},
"queryAnalyzer": {
"class": "org.apache.lucene.analysis.vi.VietnameseAnalyzer"
}
}
}'
Above command will create a new fieldType
in managed-schema
file as below.
<fieldType name="text_vn" class="solr.TextField">
<analyzer type="index" class="org.apache.lucene.analysis.vi.VietnameseAnalyzer"/>
<analyzer type="query" class="org.apache.lucene.analysis.vi.VietnameseAnalyzer"/>
</fieldType>