RuleSet model does not implement regression method

Question

RuleSet model does not implement regression method

Closed this issue 3 months ago · 7 comments

The RuleSetModelEvalutor only implements evaluateClassification, but not evaluateRegression. This means that scoring a RuleSetModel with functionName="regression", which is definitely allowed by the PMML standard, leads to a runtime error.

Answer 1 · 2024-12-03T14:07:41.000Z

.. a RuleSetModel with functionName="regression", which is definitely allowed by the PMML standard ..

Care to provide some references?

More importantly, what software is producing regression-type RuleSetModel files? Do you have an example?

Answer 2 · 2024-12-03T14:16:27.000Z

If you check https://dmg.org/pmml/v4-3/RuleSet.html#xsdElement_RuleSetModel, it states
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
and it is not clear to me why MINING-FUNCTION here cannot be regression. If only classification were allowed, why would this attribute exist? Also, the very top of the page states "Ruleset models can be thought of as flattened decision tree models.". Since decision trees can be regression models, why can't Ruleset models?

This is not from some standard converter, but a custom model that uses Ruleset for post-processing of a regression model.

Answer 3 · 2024-12-03T14:30:19.000Z

If only classification were allowed, why would this attribute exist?

The <Model>@functionName attribute is common to all model elements, whether they support only one mining function type, or multiple.

For example, the Scorecard model is an example of a regression-type model. However, it also has a functionName attribute that, at least theoretically, can be set to alternative mining function types.

Also, the very top of the page states "Ruleset models can be thought of as flattened decision tree models."

Similarly, Scorecard models can be thought as flattened decision tree models.

This is not from some standard converter, but a custom model that uses Ruleset for post-processing of a regression model.

That sounds like an interesting setup. You would normally post-process (aka calibrate) predictions using transformer-like elements (eg. NormContinuous for isotonic regression), not model-like elements.

How else does your regression-type RuleSetModel element differ from a "classical" classification-type RuleSetModel element?

The JPMML-Evaluator library allows you to subclass the o.j.e.rule_set.RuleSetModelEvaluator class, and provide the desired #evaluateRegression(ValueFactory, EvaluationContext) method implementation. If there is a demonstrable value/use case to it, then perhaps it will be possible to extract some common rule evaluation logic into a helper method.

Answer 4 · 2024-12-03T14:33:20.000Z

@jrauch-pros Can you contact me privately via e-mail (as listed in my GitHub profile)?

Answer 5 · 2024-12-03T14:55:22.000Z

I noticed that as a workaround I can just use functionName="classification" and still treat the output like I would a regression result. Looks odd, but seems to work OK. So I'm closing this.

Answer 6 · 2024-12-03T16:27:25.000Z

I noticed that as a workaround I can just use functionName="classification" and still treat the output like I would a regression result.

Main differences from the JPMML-Evaluator API perspective:

functionName="classification". The target field is an instance of org.jpmml.evaluator.Classification (the exact type is model type dependent), and the enclosed value is typically java.lang.String. For example, if your model wants to return a PI value, then you'd be getting a string "3.14159" (your application will have to parse it into a number).
functionName="regression". The target field is an instance of org.jpmml.evaluator.Regression, and the enclosed value is some java.lang.Number.

If you don't want to re-implement the #evaluateRegression(ValueFactory, EvaluationContext) method from scratch, then at least you could re-package the target field value from o.j.e.Classification to o.j.e.Regression there.

Answer 7 · 2024-12-03T16:32:38.000Z

I'm not looking to change the library because I don't control the system that does the actual scoring. So I'll make it work with what is implemented now. Thanks!