Results in pmml in R and python are not the same

Question

Results in pmml in R and python are not the same

Closed this issue 3 years ago · 1 comments

Hello, I had built up an XGBOOST model in R and convert it to a pmml file, then I input this pmml into python and try to use it to predict the data, but the result is not the same as the R. In addition, all the results inputting into the pmml of python are the same. Could you please check the issues of this? Thank you very much. Here are my code, I left the first one and the last one as validation for the model.

For modeling and converting pmml file in R:

library("xgboost")
library("r2pmml")

data(iris)

iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1

iris_test <- iris_X[c(1,150),]
iris_test_y <- iris_y[c(1,150)]

iris_X <- iris_X[-c(1,150),]
iris_y <- iris_y[-c(1,150)]


iris.matrix = model.matrix(~ . - 1, data = iris_X)
iris.matrix_test = model.matrix(~ . - 1, data = iris_test)


iris.DMatrix = xgb.DMatrix(iris.matrix, label = iris_y)
iris.DMatrix_test <- xgb.DMatrix(iris.matrix_test, label = iris_test_y)

iris.fmap = as.fmap(iris.matrix)


iris.xgb = xgboost(data = iris.DMatrix, 
                   missing = NULL, 
                   objective = "multi:softprob", 
                   num_class = 3, 
                   nrounds = 13)


xgb.pred = predict(iris.xgb,iris.DMatrix_test)
matrix(xgb.pred,nrow=2,byrow = T)

r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

The result is looking like this in R:

> round(matrix(xgb.pred,nrow=2,byrow = T),2)
         [,1]   [,2]   [,3]
[1,] 0.98   0.01   0.01
[2,] 0.02   0.05   0.93

However, when I put the plmm in python the result is looking like this one:

from pypmml import Model
model = Model.fromFile('iris_xgb.pmml')
result_1 = model.predict({
    "Sepal_Length" : 5.1,
    "Sepal_Width" : 3.5,
    "Petal_Length" : 1.4,
    "Petal_Width" : 0.2
})


result_150 = model.predict({
    "Sepal_Length" : 5.9,
    "Sepal_Width" : 3,
    "Petal_Length" : 5.1,
    "Petal_Width" : 1.8
})

And the results like:

result_1
{'probability(setosa)': 0.8991017459122782, 
'probability(virginica)': 0.04937124673474952, 
'probability(versicolor)': 0.051527007352972234}

result_150
{'probability(setosa)': 0.8991017459122782, 
'probability(virginica)': 0.04937124673474952, 
'probability(versicolor)': 0.051527007352972234}

It is very confusing. Thank you!

Answer 1 · 2021-09-06T16:33:31.000Z

from pypmml import Model

Sorry, this issue is not related to the JPMML software project in any way.