/jpmml-sklearn

Java library and command-line application for converting Scikit-Learn models to PMML

Primary LanguageJavaGNU Affero General Public License v3.0AGPL-3.0

JPMML-SkLearn

Java library and command-line application for converting [Scikit-Learn] (http://scikit-learn.org/) models to PMML.

Features

Prerequisites

The Python side of operations

Python installation can be validated as follows:

import sklearn, pandas, sklearn_pandas, joblib, numpy

print(sklearn.__version__)
print(pandas.__version__)
print(sklearn_pandas.__version__)
print(joblib.__version__)
print(numpy.__version__)

The JPMML-SkLearn side of operations

  • Java 1.7 or newer.

Installation

Enter the project root directory and build using [Apache Maven] (http://maven.apache.org/):

mvn clean install

The build produces an executable uber-JAR file target/converter-executable-1.0-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use Python to train a model.
  2. Serialize the model in pickle data format to a file in a local filesystem.
  3. Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Load data to a pandas.DataFrame object:

import pandas

iris_df = pandas.read_csv("Iris.csv")

Describe data and data pre-processing actions by creating an appropriate sklearn_pandas.DataFrameMapper object:

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn_pandas import DataFrameMapper

iris_mapper = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [StandardScaler(), PCA(n_components = 3)]),
    ("Species", None)
])

iris = iris_mapper.fit_transform(iris_df)

Train an appropriate estimator object:

from sklearn.ensemble.forest import RandomForestClassifier

iris_X = iris[:, 0:3]
iris_y = iris[:, 3]

iris_forest = RandomForestClassifier(min_samples_leaf = 5)
iris_forest.fit(iris_X, iris_y)

Serialize the sklearn_pandas.DataFrameMapper object and estimator object in pickle data format:

from sklearn.externals import joblib

joblib.dump(iris_mapper, "mapper.pkl", compress = 9)
joblib.dump(iris_forest, "estimator.pkl", compress = 9)

Please see the test script file [main.py] (https://github.com/jpmml/jpmml-sklearn/blob/master/src/test/resources/main.py) for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the estimator pickle file estimator.pkl to a PMML file estimator.pmml:

java -jar target/converter-executable-1.0-SNAPSHOT.jar --pkl-input estimator.pkl --pmml-output estimator.pmml

Converting the sklearn_pandas.DataFrameMapper pickle file mapper.pkl and the estimator pickle file estimator.pkl to a PMML file mapper-estimator.pmml:

java -jar target/converter-executable-1.0-SNAPSHOT.jar --pkl-mapper-input mapper.pkl --pkl-estimator-input estimator.pkl --pmml-output mapper-estimator.pmml

Getting help:

java -jar target/converter-executable-1.0-SNAPSHOT.jar --help

License

JPMML-SkLearn is licensed under the [GNU Affero General Public License (AGPL) version 3.0] (http://www.gnu.org/licenses/agpl-3.0.html). Other licenses are available on request.

Additional information

Please contact [info@openscoring.io] (mailto:info@openscoring.io)