/spark-pmml-exporter-validator

Using JPMML Evaluator to validate the PMML models exported from Spark

Primary LanguageJavaGNU Affero General Public License v3.0AGPL-3.0

Spark PMML Exporter Validator

Using JPMML Evaluator to validate the PMML models exported from Apache Spark.

Installation

git clone https://github.com/selvinsource/spark-pmml-exporter-validator.git
cd spark-pmml-exporter-validator
sparkvalidatorpath="$PWD"
sparkshellpath="/home/myuser/git/spark/bin/spark-shell"
mvn clean compile assembly:single

Note:

  • Ensure the variable sparkshellpath is pointing to your spark-shell

Documentation

For each supported Apache Spark MLLib algorithm there is a scala file that generates a simple model and exports it to an xml file in PMML format.
The scala also runs model.predict on some test instances of the training data set.
The java evaluator (using JPMML Evaluator and acting as a decoupled application to Apache Spark) loads the exported PMML and run the prediction on the same test instances used for model.predict.
The prediction made by Apache Spark and JPMML Evaluator produces comparable results, therefore proving the PMML export from Apache Spark works as expected.

Datasets

The following datasets have been used:

K-Means Clustering

cd src/main/resources/spark_shell_exporter/
$sparkshellpath < kmeans_iris.scala
cd $sparkvalidatorpath 
java -jar target/spark-pmml-exporter-validator-1.0.0-SNAPSHOT-jar-with-dependencies.jar KMeansModel

Linear Regression

cd src/main/resources/spark_shell_exporter/
$sparkshellpath < linearregression_winequalityred.scala
cd $sparkvalidatorpath 
java -jar target/spark-pmml-exporter-validator-1.0.0-SNAPSHOT-jar-with-dependencies.jar LinearRegressionModel

Ridge Regression

cd src/main/resources/spark_shell_exporter/
$sparkshellpath < ridgeregression_winequalityred.scala
cd $sparkvalidatorpath 
java -jar target/spark-pmml-exporter-validator-1.0.0-SNAPSHOT-jar-with-dependencies.jar RidgeRegressionModel

Lasso Regression

cd src/main/resources/spark_shell_exporter/
$sparkshellpath < lassoregression_winequalityred.scala
cd $sparkvalidatorpath 
java -jar target/spark-pmml-exporter-validator-1.0.0-SNAPSHOT-jar-with-dependencies.jar LassoModel

Linear SVM

cd src/main/resources/spark_shell_exporter/
$sparkshellpath < linearsvm_breastcancerwisconsin.scala
cd $sparkvalidatorpath 
java -jar target/spark-pmml-exporter-validator-1.0.0-SNAPSHOT-jar-with-dependencies.jar SVMModel

Logistic Regression

cd src/main/resources/spark_shell_exporter/
$sparkshellpath < logisticregression_breastcancerwisconsin.scala
cd $sparkvalidatorpath 
java -jar target/spark-pmml-exporter-validator-1.0.0-SNAPSHOT-jar-with-dependencies.jar LogisticRegressionModel