A fast and flexible Bayesian Naive Bayes implementation for the JVM written in Kotlin.
- Fully supports the online learning paradigm, in which data, and even new features, are added as they become available.
- Reasonably fast and memory efficient. We've trained a document classifier with tens of thousands of classes on hundreds of thousands of documents, and ironed out most of the hot-spots.
- Naturally works with few samples, by integrating out the uncertainty on estimated parameters.
- Models and data structures are immutable such that they are concurrency friendly.
- Efficient serialization and deserialization using protobuf.
- Missing and unknown features at prediction time are properly handled.
- Minimal dependencies.
Get the latest artifact from maven central
//Java 9
Model model = new Model().batchAdd(List.of(new Update( //Models are immutable
new Inputs( // Supports multiple feature types
Map.of( //Text features
"subject", "Attention, is it true?", //features are named.
"body", "Good day dear beneficiary. This is Secretary to president of Benin republic is writing this email ..." // multiple features of the same type have different names
),
Map.of( //Categorical features
"sender", "WWW.@galaxy.ocn.ne.jp"
),
Map.of( //Gaussian features
"n_words", 482.
)
),
"spam" // the outcome, in this case spam.
)));
Map<String, Double> predictions = model.predict(new Inputs(/*...*/));// e.g. {"spam": 0.624, "ham": 0.376}
- Kotlin - Language
- Maven - Dependency Management
- Protocol Buffers - Serialization
We use SemVer for versioning.