/lucene-classification-library

Exploring machine learning on top of Lucene index

Primary LanguageJavaBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Lucene Classification Library
by Marek Schmidt <fregaham@gmail.com>

A library to use machine learning algorithms when the training set is small
and the unlabelled document set is huge, e.g. to classify categories in
Wikipedia.

The main idea is to not go through all the documents in the collection,
but instead, go through best-n term-vectors, selected by feature selection, thus
exploring only a subset of the whole dataset.

Requirements
    lucene-core-2.9.1.jar  Apache Lucene library (other versions should probably also work)

Version 0.1

* Implemented pretty basic Naive Bayes classifier