/solr-payload-synonyms

Solr contextual synonyms using payloads

Primary LanguageJavaApache License 2.0Apache-2.0

solr-payload-synonyms

Add "contextual" synonyms to Solr using payloads.

Contextual Synonyms

A single term can relate to different concepts in the same field/document. We call a "contextual synonym" to a synonym only appliend to one (or more) specific tokens withing a field.

The principle behind this component is explained in this post. This code is provided as a support for the given post. Although a very similar approach was used in a production environment.

Build

To build the project just execute

mvn -e package

Installation

You can wrap the .jar file on the target/ directory and add it to your Solr/Fusion installation. After that you need to add the filter to one of your fieldtype:

<fieldtype name="payloads" stored="false" indexed="true" class="solr.TextField" >
 <analyzer>
   <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   <filter class="solr.DelimitedPayloadTokenFilterFactory" delimiter="|" encoder="identity"/>
   <filter class="solr.custom.PayloadSynonymTokenFilterFactory"/>
 </analyzer>
</fieldtype>

Once your fieldtype is defined we can use the very helpful Analysis page of the Solr Admin UI to check if things are working as expected. If we use the test string: Bill|Clinton talked about the bill in the Field value (index) input and select our payload fieldtype we can see an output similar to what is shown in the figure.

Solr Admin UI

A quick inspection, reveals that the tokens Bill and Clinton have the same positional information. Also the Clinton token has a defined type of SYNONYM.