JPMML-SkLearn
Java library and command-line application for converting Scikit-Learn models to PMML.
Features
- Supported Estimator and Transformer types:
- Clustering:
- Matrix Decomposition:
- Discriminant Analysis:
- Dummies:
- Ensemble Methods:
ensemble.AdaBoostRegressor
ensemble.BaggingClassifier
ensemble.BaggingRegressor
ensemble.ExtraTreesClassifier
ensemble.ExtraTreesRegressor
ensemble.GradientBoostingClassifier
ensemble.GradientBoostingRegressor
ensemble.IsolationForest
ensemble.RandomForestClassifier
ensemble.RandomForestRegressor
ensemble.VotingClassifier
- Feature Extraction:
- Feature Selection:
feature_selection.GenericUnivariateSelect
(only viasklearn2pmml.SelectorProxy
)feature_selection.RFE
(only viasklearn2pmml.SelectorProxy
)feature_selection.RFECV
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectFdr
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectFpr
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectFromModel
(either directly or viasklearn2pmml.SelectorProxy
)feature_selection.SelectFwe
(only viasklearn2pmml.SelectorProxy
)feature_selection.SelectKBest
(either directly or viasklearn2pmml.SelectorProxy
)feature_selection.SelectPercentile
(only viasklearn2pmml.SelectorProxy
)feature_selection.VarianceThreshold
(only viasklearn2pmml.SelectorProxy
)
- Generalized Linear Models:
linear_model.ARDRegression
linear_model.BayesianRidge
linear_model.ElasticNet
linear_model.ElasticNetCV
linear_model.HuberRegressor
linear_model.Lars
linear_model.LarsCV
linear_model.Lasso
linear_model.LassoCV
linear_model.LassoLars
linear_model.LassoLarsCV
linear_model.LinearRegression
linear_model.LogisticRegression
linear_model.LogisticRegressionCV
linear_model.OrthogonalMatchingPursuit
linear_model.OrthogonalMatchingPursuitCV
linear_model.Ridge
linear_model.RidgeCV
linear_model.RidgeClassifier
linear_model.RidgeClassifierCV
linear_model.SGDClassifier
linear_model.SGDRegressor
linear_model.TheilSenRegressor
- Naive Bayes:
- Nearest Neighbors:
- Pipelines:
- Neural network models:
- Preprocessing and Normalization:
preprocessing.Binarizer
preprocessing.FunctionTransformer
preprocessing.Imputer
preprocessing.LabelBinarizer
preprocessing.LabelEncoder
preprocessing.MaxAbsScaler
preprocessing.MinMaxScaler
preprocessing.OneHotEncoder
preprocessing.PolynomialFeatures
preprocessing.RobustScaler
preprocessing.StandardScaler
- Support Vector Machines:
- Decision Trees:
- Supported third-party Estimator and Transformer types:
- LightGBM:
lightgbm.LGBMClassifier
lightgbm.LGBMRegressor
- SkLearn2PMML:
sklearn2pmml.EstimatorProxy
sklearn2pmml.SelectorProxy
sklearn2pmml.decoration.Alias
sklearn2pmml.decoration.CategoricalDomain
sklearn2pmml.decoration.ContinuousDomain
sklearn2pmml.decoration.MultiDomain
sklearn2pmml.pipeline.PMMLPipeline
sklearn2pmml.preprocessing.Aggregator
sklearn2pmml.preprocessing.CutTransformer
sklearn2pmml.preprocessing.ExpressionTransformer
sklearn2pmml.preprocessing.LookupTransformer
sklearn2pmml.preprocessing.MultiLookupTransformer
sklearn2pmml.preprocessing.PMMLLabelBinarizer
sklearn2pmml.preprocessing.PMMLLabelEncoder
sklearn2pmml.preprocessing.PowerFunctionTransformer
sklearn2pmml.preprocessing.StringNormalizer
- Sklearn-Pandas:
sklearn_pandas.CategoricalImputer
sklearn_pandas.DataFrameMapper
- TPOT:
tpot.builtins.stacking_estimator.StackingEstimator
- XGBoost:
- LightGBM:
- Production quality:
- Complete test coverage.
- Fully compliant with the JPMML-Evaluator library.
Prerequisites
The Python side of operations
- Python 2.7, 3.4 or newer.
scikit-learn
0.16.0 or newer.sklearn-pandas
0.0.10 or newer.sklearn2pmml
0.14.0 or newer.
Python installation can be validated as follows:
import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml
print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)
The JPMML-SkLearn side of operations
- Java 1.8 or newer.
Installation
Enter the project root directory and build using Apache Maven:
mvn clean install
The build produces an executable uber-JAR file target/converter-executable-1.5-SNAPSHOT.jar
.
Usage
A typical workflow can be summarized as follows:
- Use Python to train a model.
- Serialize the model in
pickle
data format to a file in a local filesystem. - Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.
The Python side of operations
Load data to a pandas.DataFrame
object:
import pandas
df = pandas.read_csv("Iris.csv")
iris_X = df[df.columns.difference(["Species"])]
iris_y = df["Species"]
First, instantiate a sklearn_pandas.DataFrameMapper
object, which performs column-oriented feature engineering and selection work:
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain
column_preprocessor = DataFrameMapper([
(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])
])
Second, instantiate any number of Transformer
and Selector
objects, which perform table-oriented feature engineering and selection work:
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
from sklearn2pmml import SelectorProxy
table_preprocessor = Pipeline([
("pca", PCA(n_components = 3)),
("selector", SelectorProxy(SelectKBest(k = 2)))
])
Please note that stateless Scikit-Learn selector objects need to be wrapped into an sklearn2pmml.SelectprProxy
object.
Third, instantiate an Estimator
object:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(min_samples_leaf = 5)
Combine the above objects into a sklearn2pmml.pipeline.PMMLPipeline
object, and run the experiment:
from sklearn2pmml.pipeline import PMMLPipeline
pipeline = PMMLPipeline([
("columns", column_preprocessor),
("table", table_preprocessor),
("classifier", classifier)
])
pipeline.fit(iris_X, iris_y)
Optionally, embed model verification data:
pipeline.verify(iris_X.sample(n = 15))
Store the fitted PMMLPipeline
object in pickle
data format:
from sklearn.externals import joblib
joblib.dump(pipeline, "pipeline.pkl.z", compress = 9)
Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.
The JPMML-SkLearn side of operations
Converting the pipeline pickle file pipeline.pkl.z
to a PMML file pipeline.pmml
:
java -jar target/converter-executable-1.5-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml
Getting help:
java -jar target/converter-executable-1.5-SNAPSHOT.jar --help
License
JPMML-SkLearn is dual-licensed under the GNU Affero General Public License (AGPL) version 3.0, and a commercial license.
Additional information
JPMML-SkLearn is developed and maintained by Openscoring Ltd, Estonia.
Openscoring Ltd offers a wide variety of products and services in the field of applied predictive analytics. Please subscribe to Opensoring Ltd newsletter for periodic updates about JPMML and Openscoring software projects.