JPMML-SparkML plugin for converting XGBoost4J-Spark models to PMML.
- Apache Spark 2.0.X or 2.1.X.
- XGBoost4J-Spark 0.7.
Enter the project root directory and build using Apache Maven:
mvn clean install
The build installs JPMML-SparkML-XGBoost library into local repository using coordinates org.jpmml:jpmml-sparkml-xgboost:1.0-SNAPSHOT
.
The JPMML-SparkML-XGBoost library extends the JPMML-SparkML library with support for ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel
and ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel
prediction model classes.
Launch the Spark shell with XGBoost-extended JPMML-SparkML-Package; use --packages
to include the XGBoost4J-Spark runtime dependency:
spark-shell --packages ml.dmlc:xgboost4j-spark:0.7 --jars jpmml-sparkml-package-1.1-SNAPSHOT.jar
Fitting and exporting an example pipeline model:
import ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.RFormula
import org.jpmml.sparkml.ConverterUtil
val df = spark.read.option("header", "true").option("inferSchema", "true").csv("Iris.csv")
val formula = new RFormula().setFormula("Species ~ .")
var estimator = new XGBoostEstimator(Map("objective" -> "multi:softmax", "num_class" -> 3))
estimator = estimator.set(estimator.round, 11)
val pipeline = new Pipeline().setStages(Array(formula, estimator))
val pipelineModel = pipeline.fit(df)
val pmmlBytes = ConverterUtil.toPMMLByteArray(df.schema, pipelineModel)
println(new String(pmmlBytes, "UTF-8"))
JPMML-SparkML-XGBoost is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.
Please contact info@openscoring.io