solr_scaffold_template -- generate a simple solr analysis filter
This is a simple generator to get a mostly-novice up-and-running with
a custom analysis filter for their Solr project. See the base project
solr_scaffold for more
information about this kind of filter, as well as how to easily subclass
solr.StrField
with your own analysis (optionally changing the stored
version of the field as well).
Note: The "right" way to do this is almost certainly by getting
solr_scaffold
into a maven repository and/or creating a maven
archetypes, but that seems like a lot of work before we know if anyone cares.
Step 1: Generate
Supposed you want to make a filter to lowercase everything (ignoring that there are already better options).
- What should we call it? Must be a java classname and hence start
with an uppercase letter. So...
Lowercasify
- What package should it be in? Filters in solr in are something.
something.solr.analysis, so I'll use
com.billdueber.solr.analysis
git clone git@github.com:billdueber/solr_scaffold_template lowercasify
cd lowercasify
ruby generate.rb com.billdueber.solr.analysis Lowercasify
Step 2: Edit the filter
Two files were generated at the end of your package hierarchy under
src/main/java
: LowercasifyFilter.java
and LowercasifyFilterFactory.java
.
LowercasifyFilterFacoty.java
is probably fine just as it is, and you can
leave it alone.
LowercasifyFilter.java
has a method in it, munge
, which is where you
put your string transformation logic. It can all live in there, or you can
create other java files, pull other stuff in via the POM, etc.
package com.billdueber.solr.analysis;
import com.billdueber.solr_scaffold.analysis.SimpleFilter;
import org.apache.lucene.analysis.TokenStream;
/**
* For most cases, all you need to do is edit the `munge` method
* and leave the constructor alone.
**/
public class LowercasifyFilter extends SimpleFilter {
public LowercasifyFilter(TokenStream aStream, Boolean echoInvalidInput) {
super(aStream, echoInvalidInput);
}
@Override
public String munge(String str) {
// return str;
return str.toLowerCase(); // I M SO SMRT!
}
}
Step 3: Build it
mvn package
That was easy.
Step 4: Get the .jar files where your solr will find them.
There are two .jar files you'll need to grab
target/yourfilter-yourversion.jar
repo/com/billdueber/solr_scaffold/1.0/solr_scaffold-1.0.jar
These need to be put where your solr can find them. This is often
controlled via the solrconfig.xml
file; I put this line in mine
<lib dir="${solr.core.config}/lib" regex=".*\.jar"/>
...and then I can have a lib/
directory right next to conf/
in my solr
configuration.
Step 5: Use it in your schema.xml
<fieldType name="lowercasify" class="solr.TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="com.billdueber.solr.analysis.LowercaseifyFilterFactory"/>
</analyzer>
</fieldType>