BG-Stemmer

This is an experimental stemmer for Bulgarian. The two alternatives are the light default rule-based Lucene stemmer, and Preslav Nakov's BulStem, which is an inflectional stemmer.

This one relies on initially loading all word forms into a trie, and then for each word fetching the corresponding base form. It is less space-efficient than the other two which rely just on rules, but benchmarks show that it is significantly faster than BulStem and on par with the default Lucene stemmer.

The dictionary alongside with the affixation rules are taken from OpenOffice.

Integrating with Solr

You need to simply add the jar file (taken from the latest release), as well as the guava (v.22) and lib/patricia-trie jars on the classpath and add the following in your Solr configuration

<filter class="bg.bozho.stemmer.BulgarianStemFilterFactory"/>

Integrating with ElasticSearch

TODO

Glamdring/bg-stemmer

BG-Stemmer

Integrating with Solr

Integrating with ElasticSearch