jpmml/jpmml-evaluator

JavaError: org.jpmml.evaluator.UndefinedResultException: Undefined result

brother-darion opened this issue · 6 comments

Hi Villu,

I got a error JavaError: org.jpmml.evaluator.UndefinedResultException: Undefined result when I evaluate.

really need your help, Thanx!

here is test code:

from jpmml_evaluator import make_evaluator  # version 0.10.3
evaluator = make_evaluator("model.pmml").verify()  # model save by sklearn2pmml 0.87
evaluator.evaluate({'sepal_length': 5.9, 'sepal_width': 3.0, 'petal_length': 5.1, 'petal_width': 1.8, 'CUST_NO': 55}) 

here is error detail:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
File PythonUtil.java:110, in org.jpmml.evaluator.python.PythonUtil.evaluate()

File PythonUtil.java:132, in org.jpmml.evaluator.python.PythonUtil.evaluate()

File ModelEvaluator.java:300, in org.jpmml.evaluator.ModelEvaluator.evaluate()

File MiningModelEvaluator.java:224, in org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal()

File ModelEvaluator.java:446, in org.jpmml.evaluator.ModelEvaluator.evaluateInternal()

File MiningModelEvaluator.java:303, in org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification()

File MiningModelEvaluator.java:595, in org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation()

File ModelEvaluator.java:446, in org.jpmml.evaluator.ModelEvaluator.evaluateInternal()

File RegressionModelEvaluator.java:207, in org.jpmml.evaluator.regression.RegressionModelEvaluator.evaluateClassification()

File RegressionModelUtil.java:96, in org.jpmml.evaluator.regression.RegressionModelUtil.computeMultinomialProbabilities()

File ValueUtil.java:53, in org.jpmml.evaluator.ValueUtil.normalizeSimpleMax()

Exception: Java Exception

The above exception was the direct cause of the following exception:

org.jpmml.evaluator.UndefinedResultExceptionTraceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py:154, in Evaluator.evaluate(self, arguments, nan_as_missing)
    153 try:
--> 154 	results = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluate", self.javaEvaluator, arguments)
    155 except Exception as e:

File ~/.local/lib/python3.8/site-packages/jpmml_evaluator/jpype.py:38, in JPypeBackend.staticInvoke(self, className, methodName, *args)
     37 javaMember = getattr(javaClass, methodName)
---> 38 return javaMember(*args)

org.jpmml.evaluator.UndefinedResultException: org.jpmml.evaluator.UndefinedResultException: Undefined result

During handling of the above exception, another exception occurred:

JavaError                                 Traceback (most recent call last)
Cell In[47], line 1
----> 1 evaluator.evaluate({'sepal_length': 5.9, 'sepal_width': 3.0, 'petal_length': 5.1, 'petal_width': 1.8, 'CUST_NO': 55})

File ~/.local/lib/python3.8/site-packages/jpmml_evaluator/__init__.py:156, in Evaluator.evaluate(self, arguments, nan_as_missing)
    154 	results = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluate", self.javaEvaluator, arguments)
    155 except Exception as e:
--> 156 	raise self.backend.toJavaError(e)
    157 results = self.backend.loads(results)
    158 if hasattr(self, "dropColumns"):

JavaError: org.jpmml.evaluator.UndefinedResultException: Undefined result

Here is PMML file model.pmml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
	<Header>
		<Application name="SkLearn2PMML package" version="0.87.0"/>
		<Timestamp>2024-08-02T07:03:52Z</Timestamp>
	</Header>
	<MiningBuildTask>
		<Extension name="repr">PMMLPipeline(steps=[('classifier', SGDClassifier(eta0=0.1, loss='modified_huber', random_state=123))])</Extension>
	</MiningBuildTask>
	<DataDictionary>
		<DataField name="IS_TARGET" optype="categorical" dataType="integer">
			<Value value="0"/>
			<Value value="1"/>
			<Value value="2"/>
		</DataField>
		<DataField name="sepal_length" optype="continuous" dataType="double"/>
		<DataField name="sepal_width" optype="continuous" dataType="double"/>
		<DataField name="petal_length" optype="continuous" dataType="double"/>
		<DataField name="petal_width" optype="continuous" dataType="double"/>
		<DataField name="CUST_NO" optype="continuous" dataType="double"/>
	</DataDictionary>
	<MiningModel functionName="classification" algorithmName="sklearn.linear_model._stochastic_gradient.SGDClassifier">
		<MiningSchema>
			<MiningField name="IS_TARGET" usageType="target"/>
			<MiningField name="sepal_length"/>
			<MiningField name="sepal_width"/>
			<MiningField name="petal_length"/>
			<MiningField name="petal_width"/>
			<MiningField name="CUST_NO"/>
		</MiningSchema>
		<Segmentation multipleModelMethod="modelChain" missingPredictionTreatment="returnMissing">
			<Segment id="1">
				<True/>
				<RegressionModel functionName="regression" normalizationMethod="logit">
					<MiningSchema>
						<MiningField name="sepal_length"/>
						<MiningField name="sepal_width"/>
						<MiningField name="petal_length"/>
						<MiningField name="petal_width"/>
						<MiningField name="CUST_NO"/>
					</MiningSchema>
					<Output>
						<OutputField name="decisionFunction(0)" optype="continuous" dataType="double" isFinalResult="false"/>
					</Output>
					<RegressionTable intercept="621.9669137171272">
						<NumericPredictor name="sepal_length" coefficient="168.86754701880753"/>
						<NumericPredictor name="sepal_width" coefficient="1955.182072829122"/>
						<NumericPredictor name="petal_length" coefficient="-3951.180472188851"/>
						<NumericPredictor name="petal_width" coefficient="-1791.1164465786242"/>
						<NumericPredictor name="CUST_NO" coefficient="8.003201280513126"/>
					</RegressionTable>
				</RegressionModel>
			</Segment>
			<Segment id="2">
				<True/>
				<RegressionModel functionName="regression" normalizationMethod="logit">
					<MiningSchema>
						<MiningField name="sepal_length"/>
						<MiningField name="sepal_width"/>
						<MiningField name="petal_length"/>
						<MiningField name="petal_width"/>
						<MiningField name="CUST_NO"/>
					</MiningSchema>
					<Output>
						<OutputField name="decisionFunction(1)" optype="continuous" dataType="double" isFinalResult="false"/>
					</Output>
					<RegressionTable intercept="-4526.783383641282">
						<NumericPredictor name="sepal_length" coefficient="-3994.5875985410025"/>
						<NumericPredictor name="sepal_width" coefficient="-5524.88528062124"/>
						<NumericPredictor name="petal_length" coefficient="3716.672549711752"/>
						<NumericPredictor name="petal_width" coefficient="1603.2474408754204"/>
						<NumericPredictor name="CUST_NO" coefficient="334.1569596423094"/>
					</RegressionTable>
				</RegressionModel>
			</Segment>
			<Segment id="3">
				<True/>
				<RegressionModel functionName="regression" normalizationMethod="logit">
					<MiningSchema>
						<MiningField name="sepal_length"/>
						<MiningField name="sepal_width"/>
						<MiningField name="petal_length"/>
						<MiningField name="petal_width"/>
						<MiningField name="CUST_NO"/>
					</MiningSchema>
					<Output>
						<OutputField name="decisionFunction(2)" optype="continuous" dataType="double" isFinalResult="false"/>
					</Output>
					<RegressionTable intercept="335.0962538783194">
						<NumericPredictor name="sepal_length" coefficient="2092.8652321630666"/>
						<NumericPredictor name="sepal_width" coefficient="-134.39033597584245"/>
						<NumericPredictor name="petal_length" coefficient="4592.676481691167"/>
						<NumericPredictor name="petal_width" coefficient="2120.0453001132446"/>
						<NumericPredictor name="CUST_NO" coefficient="-739.9018497546195"/>
					</RegressionTable>
				</RegressionModel>
			</Segment>
			<Segment id="4">
				<True/>
				<RegressionModel functionName="classification" normalizationMethod="simplemax">
					<MiningSchema>
						<MiningField name="IS_TARGET" usageType="target"/>
						<MiningField name="decisionFunction(0)"/>
						<MiningField name="decisionFunction(1)"/>
						<MiningField name="decisionFunction(2)"/>
					</MiningSchema>
					<RegressionTable intercept="0.0" targetCategory="0">
						<NumericPredictor name="decisionFunction(0)" coefficient="1.0"/>
					</RegressionTable>
					<RegressionTable intercept="0.0" targetCategory="1">
						<NumericPredictor name="decisionFunction(1)" coefficient="1.0"/>
					</RegressionTable>
					<RegressionTable intercept="0.0" targetCategory="2">
						<NumericPredictor name="decisionFunction(2)" coefficient="1.0"/>
					</RegressionTable>
				</RegressionModel>
			</Segment>
		</Segmentation>
	</MiningModel>
</PMML>

What is this - a trick question?

Pay attention to the RegressionModel@normalizationMethod attribute of elementary regression models (there are three of them - one for "0", "1" and "2" category levels each). It states logit, which means that the "raw" regression table value y_raw will be transformed using the inverse logit function 1d / (1d + Math.exp(-y_raw)).

For your example data record, the raw regression table values are as follows:

0: -15451.02228196098
0: -4449.997879464211
0: -1176.0400239623934

They have such great magnitudes that they "blow up" the Math.exp() function. Basically, any input greater than 1000 yields Double.POSITIVE_INFINITY there.

After the interim transformation, you have the following transformed values {0 : 0, 1 : 0, 2 : 0}. You cannot compute probabilities for this value set using the simplemax classification normalization method, because division by zero is undefined. Hence, the JPMML-Evaluator library thinks that it gotten permanently stuck, and bails out of the situation by raising a o.j.e.UndefinedResultException.

What did you expect instead?

Closing as "works as intended".

The provided model needs to be re-trained (specifically, the CUST_NO column should be excluded from the training dataset, because it acts as a row identifier not as a row feature).

I wonder what kind of predictions were you getting from the original Scikit-Learn model? Did it return Double#NaN? Would that have been a better result for you?

Scikit-Learn does not not have/use any advanced error detection and recovery techniques. Even though the provided model is clearly incorrect (at least for the human eye), Scikit-Learn carries out all the requested computations dutifully and yields Double#NaN in the end. In contrast, (J)PMML understands that "things are not right" halfway into the computation, and bails out instantly with an error, instead of carrying out more meaningless work.

I think that the (J)PMML approach is better than the Scikit-Learn approach, especially in production systems - you get clear feedback that something is not right.

Closing as "works as intended".

The provided model needs to be re-trained (specifically, the CUST_NO column should be excluded from the training dataset, because it acts as a row identifier not as a row feature).

I wonder what kind of predictions were you getting from the original Scikit-Learn model? Did it return Double#NaN? Would that have been a better result for you?

Scikit-Learn does not not have/use any advanced error detection and recovery techniques. Even though the provided model is clearly incorrect (at least for the human eye), Scikit-Learn carries out all the requested computations dutifully and yields Double#NaN in the end. In contrast, (J)PMML understands that "things are not right" halfway into the computation, and bails out instantly with an error, instead of carrying out more meaningless work.

I think that the (J)PMML approach is better than the Scikit-Learn approach, especially in production systems - you get clear feedback that something is not right.

Thanks for response! yes, I agress with you with clear feedback that something is not right, but on the batch predict scene, I think it would be better if has option to choose raise exception or set this record result to NAN and keep predict the others.

Closing as "works as intended".
The provided model needs to be re-trained (specifically, the CUST_NO column should be excluded from the training dataset, because it acts as a row identifier not as a row feature).
I wonder what kind of predictions were you getting from the original Scikit-Learn model? Did it return Double#NaN? Would that have been a better result for you?
Scikit-Learn does not not have/use any advanced error detection and recovery techniques. Even though the provided model is clearly incorrect (at least for the human eye), Scikit-Learn carries out all the requested computations dutifully and yields Double#NaN in the end. In contrast, (J)PMML understands that "things are not right" halfway into the computation, and bails out instantly with an error, instead of carrying out more meaningless work.
I think that the (J)PMML approach is better than the Scikit-Learn approach, especially in production systems - you get clear feedback that something is not right.

Thanks for response! yes, I agress with you with clear feedback that something is not right, but on the batch predict scene, I think it would be better if has option to choose raise exception or set this record result to NAN and keep predict the others.

oh, batch predict only work on jpmml_evaluator Python,right? the batch predict scene I said is meaning using evaluator.evaluateAll on Python.

on the batch predict scene, I think it would be better if has option to choose raise exception or set this record result to NAN and keep predict the others.

The Java interface o.j.e.Evaluator only supports single-row prediction mode via Evaluator#evaluate(Map).

The Python interface builds its batch prediction mode jpmml_evaluator.Evaluator.evaluateAll(DataFrame) on top it. The main benefit of the batch interface is to send all rows from Java to Python as a single call (instead of many calls, one call per row).

Now, this is actually a good idea that the JPMML-Evaluator-Python should provide an option for configuring a "what to do about an EvaluationException".

I can quickly think of two options:

  1. "return invalid" aka "as-is". Matches the current behaviour, where the Java exception is propagated to the top, and the evaluation is stopped at that location.
  2. "replace with NaN" aka "ignore". The Java component will catch a row-specific exception, and replaces the result for that row with Double#NaN (or some other user-specified constant?).

Also, in "return invalid" aka "as-is" mode, it should be possible to configure if partial results can be returned or not. Suppose there is a batch of 10'000 rows, and the evaluation fails on row 8566 because of some data input error. I think it might makse sense to return the leading 8565 results in that case.

@brother-darion See jpmml/jpmml-evaluator-python#26, and lets continue this discussion there - it's specific to the Python interface