/FlashTextSpark

Spark wrapper around jasonsperske's Java port of flashtext.py

Primary LanguageJavaMIT LicenseMIT

FlashTextSpark

Introduces SparkKeywordProcessor which is a thin Scala wrapper around the FlashTextJava library done by jasonsperske. That project was a port of the flashtext.py into Java.

The motivation for this was to run FlashText on Spark to efficiently tag milliions of unstructured documents for matches against a large corpus of keywords (also in the millions).

Building

Just clone the repo an if you are on UNIX:

./gradlew build

or on windows:

./gradlew.bat build

This will bootstrap the project with all the dependencies, just requiring java 8 to be installed.