
Knn text classification with Solr/Lucene More Like This

Primary LanguageJava


An example K-Nearest-Neighbors classifier built on top of Solr & Lucene. Works with Weka ARFF formats.

You can try the program with this dataset:


Clone the repo with github, and compile with maven by running:

mvn clean install

(a target directory will be generated w/ a jar executable)

Create the index by executing:

 java -Xmx512m -Dfile.encoding=UTF-8 -jar target/knn-classifier-1.0-SNAPSHOT-jar-with-dependencies.jar \
 --action=index --input=/tmp/IMDB-F.arff --output=/tmp/solr --solrconf=./src/main/resources/solr/

Run your classification:

java -Xmx512m -Dfile.encoding=UTF-8 -jar target/knn-classifier-1.0-SNAPSHOT-jar-with-dependencies.jar \
--action=knn --input=/tmp/IMDB-F.arff --output=/tmp/solr --solrconf=./src/main/resources/solr/