jpmml-evaluator does not handle null values when used in java?
liuhuanshuo opened this issue · 1 comments
I have a pmml file that encapsulates the lgb model.
If I were calling this pmml file in Python, it would look like the following code
from jpmml_evaluator import make_evaluator
from jpmml_evaluator.py4j import launch_gateway, Py4JBackend
pmml_file = "risk_model.pmml"
gateway = launch_gateway()
backend = Py4JBackend(gateway)
evaluator = make_evaluator(java_backend, pmml_file)
evaluator.evaluateAll(x_test)
Note that if x_test here contains a null value, the result will be printed correctly
But if I call this model in java and pass x_test to the pmml file as well, I get an error.
Seems like java is handling null values incorrectly?
It is important to understand that the jpmml_evaluator
Python library/package is a thin Python language wrapper around the JPMML-Evaluator library.
Therefore, if the Python code is able to make correct predictions, then the Java code must be able to do so too - because that's where the actual computation happens everytime.
Note that if x_test here contains a null value
The source of the error is your data loading/preparation layer (aka ETL layer).
In Python the data is loaded using pandas.DataFrame
, but in Java it is some custom data container class? My explanation is that pandas.DataFrame
is performing some automatic data sanitation, which currently does not have Java equivalent.
Seems like java is handling null values incorrectly?
See the Java exception stack trace - the current complaint is that the "d15_once_called_opst_phone_cnt" field cannot accpet a 1-character wide whitespace string as a valid argument.
Looks like your Java code should automatically convert all-whitespace strings to null
references to make everything work:
String string = (String)arguments.get("d15_once_called_opst_phone_cnt")
// Treat all-whitespace strings as missing values
if((string.trim()).isEmpty()){
// In Java, a missing value is represented using the `null` literal
string = null;
arguments.put("d15_once_called_opst_phone_cnt", string)
}