Machine learning algorithms using the distributed computing platform Hazelcast JET.
Use JetMLDemo as example of usage of the Jet ML Pipeline.
git clone https://github.com/selvinsource/hazelcast-jet-ml.git
cd hazelcast-jet-ml
mvn clean compile test assembly:single
The Jet ML Pipeline allows to chain Estimators and Transformers.
- The Estimator is an algorithm that returns a Transformer given a dataset to fit
- The Transformer is an ML model that transforms one dataset into another
- A dataset is represented by n Hazelcast IListJet (which is not distributed, in a future version this will be converted to a distributed IMapJet)
Inspired by scikit-learn, see paper.
The following datasets have been used:
Train a model and show identified clusters
// Create two Jet members
JetInstance instance1 = Jet.newJetInstance();
Jet.newJetInstance();
// Get a training dataset (it is assumed this is already populated, e.g. from a file)
IListJet<double[]> trainDataset = instance1.getList("trainDataset");
// Train a model using the train dataset, k = 3, maxIter = 20
// k = 3 the number of desired clusters
// maxIter = 20 maximum number of iteration if not converging
KMeans kMeans = new KMeans(3, 20);
KMeansModel model = kMeans.fit(trainDataset);
// Show the identified centroids
LOGGER.info("Centroids:");
model.getCentroids().stream().forEach(c -> LOGGER.info(Arrays.toString(c)));
Jet.shutdownAll();
Train a model and predict test data using Jet ML Pipeline
// Create two Jet members
JetInstance instance1 = Jet.newJetInstance();
Jet.newJetInstance();
// Get datasets to train the model and then test it
IListJet<double[]> trainDataset = instance1.getList("trainDataset");
IListJet<double[]> testDataset = instance1.getList("testDataset");
// Create a KMeans estimator
Estimator<double[]> estimator = new KMeans(3, 20);
// Hazelcast Get ML Pipeline: given a train dataset the estimator (KMeans) returns a transformer (KMeanModel) which assigns clusters to test dataset instances
IListJet<double[]> outputDataset = estimator.fit(trainDataset).transform(testDataset);
Jet.shutdownAll();
java -jar target/hazelcast-jet-ml-0.6.1-jar-with-dependencies.jar KMeans
See demo full code.