blayze

A fast and flexible Bayesian Naive Bayes implementation for the JVM written in Kotlin.

Fully supports the online learning paradigm, in which data, and even new features, are added as they become available.
Reasonably fast and memory efficient. We've trained a document classifier with tens of thousands of classes on hundreds of thousands of documents, and ironed out most of the hot-spots.
Naturally works with few samples, by integrating out the uncertainty on estimated parameters.
Models and data structures are immutable such that they are concurrency friendly.
Efficient serialization and deserialization using protobuf.
Missing and unknown features at prediction time are properly handled.
Minimal dependencies.

Usage

Get the latest artifact from maven central

//Java 9
Model model = new Model().batchAdd(List.of(new Update( //Models are immutable
        new Inputs( // Supports multiple feature types
                Map.of( //Text features
                        "subject", "Attention, is it true?", //features are named.
                        "body", "Good day dear beneficiary. This is Secretary to president of Benin republic is writing this email ..." // multiple features of the same type have different names
                ),
                Map.of( //Categorical features
                        "sender", "WWW.@galaxy.ocn.ne.jp"
                ),
                Map.of( //Gaussian features
                        "n_words", 482.
                )
        ),
        "spam" // the outcome, in this case spam.
)));

Map<String, Double> predictions = model.predict(new Inputs(/*...*/));// e.g. {"spam": 0.624, "ham": 0.376}

JesperTerkelsen/blayze

blayze

Usage

Built With

Versioning

Authors