/neo4j-additional-analyzers

sample code for a custom fulltext analyzer for Neo4j

Primary LanguageJava

additional full text analyzers for Neo4j

The graph database Neo4j does support fulltext indexing out of the box. Part of this is fine grained configuration of analzyers for each index. There’s a good write up how to build your own custom analyzers. This project leverages couple of commonly used analyzers.

Installation

The analyzers are supposed to work with Neo4j >= 3.4. Clone this repo and use Maven to build a jar file:

mvn package
cp target/neo4j-additional-analyzers-<version>.jar <NEO4J_HOME>/plugins

and restart Neo4j.

Usage

Set up a fulltext index and specificy the custom index:

CALL db.index.fulltext.createNodeIndex("myindex", ["Articles"], ["abstract", "content"], {analyzers: "whitespace_lower"});
CALL db.index.fulltext.createNodeIndex("myindex", ["Articles"], ["abstract", "content"], {analyzers: "synonym"});

Reference of available analzyers

whitespace_lower

This analyzer combines the whitespace analyzer with German and English stopword lists and applies a toLower conversion as well. Due to usage of whitespace terms containing dashes, e.g. PNAS-108 as gene symbol or technical terms like x-270°C are treated as one atomic term.

synonym

Synonym analyzer is aware of gene symbols containing whitespace, e.g. cGK 1. Those multi-word terms will be mapping to a single word alternative using Lucene’s SynonymFilterFactory. Aside of this mapping, the synonym analyzer works exactly like whitespace_lower