JPMML-SkLearn

Java library and command-line application for converting Scikit-Learn models to PMML.

Features

Supported Estimator and Transformer types:
- Clustering:
  - cluster.KMeans
  - cluster.MiniBatchKMeans
- Matrix Decomposition:
  - decomposition.PCA
  - decomposition.IncrementalPCA
- Discriminant Analysis:
  - discriminant_analysis.LinearDiscriminantAnalysis
- Dummies:
  - dummy.DummyClassifier
  - dummy.DummyRegressor
- Ensemble Methods:
- Feature Extraction:
- Feature Selection:
  - feature_selection.GenericUnivariateSelect (only via sklearn2pmml.SelectorProxy)
  - feature_selection.RFE (only via sklearn2pmml.SelectorProxy)
  - feature_selection.RFECV (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFdr (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFpr (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFromModel (either directly or via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectFwe (only via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectKBest (either directly or via sklearn2pmml.SelectorProxy)
  - feature_selection.SelectPercentile (only via sklearn2pmml.SelectorProxy)
  - feature_selection.VarianceThreshold (only via sklearn2pmml.SelectorProxy)
- Generalized Linear Models:
- Naive Bayes:
  - naive_bayes.GaussianNB
- Nearest Neighbors:
  - neighbors.KNeighborsClassifier
  - neighbors.KNeighborsRegressor
- Pipelines:
  - pipeline.FeatureUnion
  - pipeline.Pipeline
- Neural network models:
  - neural_network.MLPClassifier
  - neural_network.MLPRegressor
- Preprocessing and Normalization:
- Support Vector Machines:
- Decision Trees:
Supported third-party Estimator and Transformer types:
- LightGBM:
  - lightgbm.LGBMClassifier
  - lightgbm.LGBMRegressor
- SkLearn2PMML:
  - sklearn2pmml.EstimatorProxy
  - sklearn2pmml.SelectorProxy
  - sklearn2pmml.decoration.Alias
  - sklearn2pmml.decoration.CategoricalDomain
  - sklearn2pmml.decoration.ContinuousDomain
  - sklearn2pmml.decoration.MultiDomain
  - sklearn2pmml.pipeline.PMMLPipeline
  - sklearn2pmml.preprocessing.Aggregator
  - sklearn2pmml.preprocessing.CutTransformer
  - sklearn2pmml.preprocessing.ExpressionTransformer
  - sklearn2pmml.preprocessing.LookupTransformer
  - sklearn2pmml.preprocessing.MultiLookupTransformer
  - sklearn2pmml.preprocessing.PMMLLabelBinarizer
  - sklearn2pmml.preprocessing.PMMLLabelEncoder
  - sklearn2pmml.preprocessing.PowerFunctionTransformer
  - sklearn2pmml.preprocessing.StringNormalizer
- Sklearn-Pandas:
  - sklearn_pandas.CategoricalImputer
  - sklearn_pandas.DataFrameMapper
- TPOT:
  - tpot.builtins.stacking_estimator.StackingEstimator
- XGBoost:
  - xgboost.XGBClassifier
  - xgboost.XGBRegressor
Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.

Prerequisites

The Python side of operations

Python 2.7, 3.4 or newer.
scikit-learn 0.16.0 or newer.
sklearn-pandas 0.0.10 or newer.
sklearn2pmml 0.14.0 or newer.

Python installation can be validated as follows:

import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml

print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)

The JPMML-SkLearn side of operations

Java 1.8 or newer.

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces an executable uber-JAR file target/converter-executable-1.5-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

Use Python to train a model.
Serialize the model in pickle data format to a file in a local filesystem.
Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Load data to a pandas.DataFrame object:

import pandas

df = pandas.read_csv("Iris.csv")

iris_X = df[df.columns.difference(["Species"])]
iris_y = df["Species"]

First, instantiate a sklearn_pandas.DataFrameMapper object, which performs column-oriented feature engineering and selection work:

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain

column_preprocessor = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])
])

Second, instantiate any number of Transformer and Selector objects, which perform table-oriented feature engineering and selection work:

from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
from sklearn2pmml import SelectorProxy

table_preprocessor = Pipeline([
	("pca", PCA(n_components = 3)),
	("selector", SelectorProxy(SelectKBest(k = 2)))
])

Please note that stateless Scikit-Learn selector objects need to be wrapped into an sklearn2pmml.SelectprProxy object.

Third, instantiate an Estimator object:

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(min_samples_leaf = 5)

Combine the above objects into a sklearn2pmml.pipeline.PMMLPipeline object, and run the experiment:

from sklearn2pmml.pipeline import PMMLPipeline

pipeline = PMMLPipeline([
    ("columns", column_preprocessor),
    ("table", table_preprocessor),
    ("classifier", classifier)
])
pipeline.fit(iris_X, iris_y)

Optionally, embed model verification data:

pipeline.verify(iris_X.sample(n = 15))

Store the fitted PMMLPipeline object in pickle data format:

from sklearn.externals import joblib

joblib.dump(pipeline, "pipeline.pkl.z", compress = 9)

Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the pipeline pickle file pipeline.pkl.z to a PMML file pipeline.pmml:

java -jar target/converter-executable-1.5-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml

Getting help:

java -jar target/converter-executable-1.5-SNAPSHOT.jar --help

License

JPMML-SkLearn is dual-licensed under the GNU Affero General Public License (AGPL) version 3.0, and a commercial license.

Additional information

JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.

Openscoring Ltd offers a wide variety of products and services in the field of applied predictive analytics. Please subscribe to Opensoring Ltd newsletter for periodic updates about JPMML and Openscoring software projects.

arouraus/jpmml-sklearn

JPMML-SkLearn

Features

Prerequisites

The Python side of operations

The JPMML-SkLearn side of operations

Installation

Usage

The Python side of operations

The JPMML-SkLearn side of operations

License

Additional information