jpmml/r2pmml

XGboost converstion issue

Kawalierus opened this issue · 10 comments

I have tried recreating example as provided in example (https://github.com/jpmml/r2pmml#package-xgboost). Unfortunately I receive error message :

SEVERE: Failed to convert RDS to PMML
java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)
I have notice that in the similar topic (https://www.gitmemory.com/issue/jpmml/r2pmml/64/718139304) you suggested that label should be factorized. Nevertheless it would cause error when providing label as factor to xgboost function [17:10:50] amalgamation/../src/objective/multiclass_obj.cu:120: SoftmaxMultiClassObj: label must be in [0, num_class).

I have initially tried to run conversion on poisson count model and obtained identical error (and prediction there is frequency so I do not see where strings are to be converted to integers).

R version: 4.0.3
Xgboost version: 1.3.2.1
r2pmml version: 0.25.1

I would greatly appreciate support regarding this issue.

java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:283)

This is the line 183 of XGBoostConverter source code:
https://github.com/jpmml/jpmml-r/blob/1.4.2/src/main/java/org/jpmml/rexp/XGBoostConverter.java#L283

As you can see, the converter expects that the name attribute of the Feature Map (aka FMap) object is a R factor. You've been giving it a R string instead.

To fix resolve this conversion error, simply change the type of the name attribute to R factor:

iris.fmap = as.fmap(iris.matrix)

# THIS!
iris.fmap$name = as.factor(iris.fmap$name)

#64

These two issues are close relatives, because they both reflect inproper typing of FMap attributes (name here, type there).

All FMap attributes must be R factors.

All FMap attributes must be R factors.

Leaving this issue open for now - the r2pmml::as.fmap utility function should always force-cast all FMap attributes to R factors before returning the result to the end user.

Thank you very much for such a quick response.

Unfortunately your advice have not resolved the issue in my case. Please note the code as below:

library("xgboost")
library("r2pmml")

data(iris)

iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1

iris.matrix = model.matrix(~ . - 1, data = iris_X)

iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y)
iris.fmap = as.fmap(iris.matrix)
iris.fmap$name = as.factor(iris.fmap$name)
iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)

r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

still results in the error:

SEVERE: Failed to convert RDS to PMML
java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:284)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:284)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.ClassCastException: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:284)

See - you fixed the type of the name attribute, and now the exception is happening one line later (284 instead of 283)

Please see my earlier resolution - "All FMap attributes must be R factors"

iris.fmap = as.fmap(iris.matrix)

iris.fmap$id = as.factor(iris.fmap$id)
iris.fmap$name = as.factor(iris.fmap$name)
iris.fmap$type = as.factor(iris.fmap$type)

My XGBoost example works fine with R 3.X.

Looks like R 4.X has botched matrix column types (were factors before, are strings now).

Thank you very much again. Apologies for not noticing this at first glance. We are making steady progress as I have reached another type of error by applying all your corrections.

Code:
library("xgboost")
library("r2pmml")

data(iris)

iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1

iris.matrix = model.matrix(~ . - 1, data = iris_X)

iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y)
iris.fmap = as.fmap(iris.matrix)

iris.fmap$id = as.factor(iris.fmap$id)
iris.fmap$name = as.factor(iris.fmap$name)
iris.fmap$type = as.factor(iris.fmap$type)

iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)

r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

Error:

SEVERE: Failed to convert RDS to PMML
java.lang.IllegalArgumentException
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:295)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:295)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:265)
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:230)
at org.jpmml.rexp.XGBoostConverter.ensureFeatureMap(XGBoostConverter.java:209)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:66)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

java.lang.IllegalArgumentException
at org.jpmml.rexp.XGBoostConverter.loadFeatureMap(XGBoostConverter.java:295)

See for yourself:
https://github.com/jpmml/jpmml-r/blob/1.4.2/src/main/java/org/jpmml/rexp/XGBoostConverter.java#L295

Looks like it's not allowed to convert the id attribute to R factor; leave it to be a R integer.

iris.fmap = as.fmap(iris.matrix)

iris.fmap$name = as.factor(iris.fmap$name)
iris.fmap$type = as.factor(iris.fmap$type)

And we have reached another one:

SEVERE: Failed to convert RDS to PMML
java.lang.IllegalArgumentException: 1730313018.1919250021
at org.jpmml.xgboost.Learner.load(Learner.java:82)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:93)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:57)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:45)
at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:309)
at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:242)
at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:218)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:80)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: 1730313018.1919250021
at org.jpmml.xgboost.Learner.load(Learner.java:82)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:93)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:57)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:45)
at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:309)
at org.jpmml.rexp.XGBoostConverter.loadLearner(XGBoostConverter.java:242)
at org.jpmml.rexp.XGBoostConverter.ensureLearner(XGBoostConverter.java:218)
at org.jpmml.rexp.XGBoostConverter.encodeSchema(XGBoostConverter.java:80)
at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:70)
at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
at org.jpmml.rexp.Main.run(Main.java:149)
at org.jpmml.rexp.Main.main(Main.java:97)

java.lang.IllegalArgumentException: 1730313018.1919250021

Duplicate of jpmml/jpmml-xgboost#54

TLDR: XGBoost 1.3.X switched model saving data format from binary to JSON.

Two solutions:

  • Order XGBoost 1.3.X to save model in binary data format.
  • Downgrade XGBoost to 1.2.X.

@Kawalierus I've released R2PMML version 0.25.2 to GitHub, which is able to convert XGBoost 1.3(.3) models now.