jpmml/jpmml-evaluator

Emulating "multi-output" models

EVOk06 opened this issue · 8 comments

My linear regression model has y1-y5 5 target fields and x1-x21 21 input fields in pmml file. While I invoke evaluator.evaluate(...) it throw exception as followed:

Exception in thread "main" org.jpmml.evaluator.EvaluationException: Field sets ["y1"] and ["y2"] do not match
at org.jpmml.evaluator.mining.MiningModelEvaluator.selectAll(MiningModelEvaluator.java:870)
at org.jpmml.evaluator.mining.MiningModelEvaluator.getSegmentationResult(MiningModelEvaluator.java:690)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateRegression(MiningModelEvaluator.java:246)
at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:446)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:237)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:302)

Why this problem occurred and how should i fix this? Thanks.

My linear regression model has y1-y5 5 target fields and x1-x21 21 input fields in pmml file.

Looks like you're trying to implement "multi-output model" functionality using the Segmentation@multipleModelMethod="selectAll" aggregation method.

It is my interpretation of the PMML specification that this aggregation method is not exactly designed for this purpose:
http://dmg.org/pmml/v4-4-1/MultipleModels.html

The selectAll aggregator would work if your ensemble model contained five models that all define y target field. In that case, the predicted value of the "multi-output model" would be a java.util.List instance with five elements (corresponding to current y1, y2, .., y5 values).

I have asked the DMG.org to clarify the implementation of "multi-output models". I was proposing to define a dedicated multiModelChain aggregation method:
http://mantis.dmg.org/view.php?id=233

Unfortunately, DMG.org hasn't been paying any attention to this topic in the past ~1.5 years.

Will probably proceed on my own, and implement the multiModelChain independently in the JPMML-Evaluator 1.6.X development branch.

TLDR: As a quick workaround, I'd suggest you to rename all target fields to y (while keeping the selectAll aggregation method).

In that case, the evaluation should succeed, and you would be receiving a 5-element java.util.List as the predicted value, which you can then unpack in your Java application code.

@EVOk06 I'll close this issue when I've implemented an (x-)multiModelChain support into the JPMML-Evaluator 1.6.X development branch.

In the meantime, you may report back if renaming target fields solved your problem.

Also, I'd be interested in hearing if there are more "feature requests" around this functionality, in order to make sure that the upcoming design and implementation of the (x-)multiModelChain aggregation method will be maximally useful.

TLDR: As a quick workaround, I'd suggest you to rename all target fields to y (while keeping the selectAll aggregation method).

In that case, the evaluation should succeed, and you would be receiving a 5-element java.util.List as the predicted value, which you can then unpack in your Java application code.

Thanks for help.
We aggregated again and all target named as 'y', the PMML file like:

<PMML xmlns="http://www.dmg.org/PMML-4_4" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.4">
    <Header>
		<Application name="JPMML-SkLearn" version="1.6.31"/>
		<Timestamp>2021-11-22T10:06:39Z</Timestamp>
	</Header>
	<DataDictionary>
		<DataField name="y" optype="continuous" dataType="double"/>
		<DataField name="y" optype="continuous" dataType="double"/>
		<DataField name="y" optype="continuous" dataType="double"/>
		<DataField name="y" optype="continuous" dataType="double"/>
		<DataField name="y" optype="continuous" dataType="double"/>
		<DataField name="x1" optype="continuous" dataType="float"/>
		<DataField name="x2" optype="continuous" dataType="float"/>
		<DataField name="x3" optype="continuous" dataType="float"/>
		<DataField name="x4" optype="continuous" dataType="float"/>
		<DataField name="x5" optype="continuous" dataType="float"/>
		<DataField name="x6" optype="continuous" dataType="float"/>
		<DataField name="x7" optype="continuous" dataType="float"/>
		<DataField name="x8" optype="continuous" dataType="float"/>
		<DataField name="x9" optype="continuous" dataType="float"/>
		<DataField name="x10" optype="continuous" dataType="float"/>
		<DataField name="x11" optype="continuous" dataType="float"/>
		<DataField name="x12" optype="continuous" dataType="float"/>
		<DataField name="x13" optype="continuous" dataType="float"/>
		<DataField name="x14" optype="continuous" dataType="float"/>
		<DataField name="x15" optype="continuous" dataType="float"/>
		<DataField name="x16" optype="continuous" dataType="float"/>
		<DataField name="x17" optype="continuous" dataType="float"/>
		<DataField name="x18" optype="continuous" dataType="float"/>
		<DataField name="x19" optype="continuous" dataType="float"/>
		<DataField name="x20" optype="continuous" dataType="float"/>
		<DataField name="x21" optype="continuous" dataType="float"/>
	</DataDictionary>
    <MiningModel functionName="regression">
        <MiningSchema>
            <MiningField name="y" usageType="target"/>
            <MiningField name="y" usageType="target"/>
            <MiningField name="y" usageType="target"/>
            <MiningField name="y" usageType="target"/>
            <MiningField name="y" usageType="target"/>
            <MiningField name="x1"/>
            <MiningField name="x2"/>
            <MiningField name="x3"/>
            <MiningField name="x4"/>
            <MiningField name="x5"/>
            <MiningField name="x6"/>
            <MiningField name="x7"/>
            <MiningField name="x8"/>
            <MiningField name="x9"/>
            <MiningField name="x10"/>
            <MiningField name="x11"/>
            <MiningField name="x12"/>
            <MiningField name="x13"/>
            <MiningField name="x14"/>
            <MiningField name="x15"/>
            <MiningField name="x16"/>
            <MiningField name="x17"/>
            <MiningField name="x18"/>
            <MiningField name="x19"/>
            <MiningField name="x20"/>
            <MiningField name="x21"/>
        </MiningSchema>
        <Segmentation multipleModelMethod="selectAll">
            <Segment id='30m'>
                <True/>
                <MiningModel functionName="regression" id="tpl">
                    <MiningSchema>
                        <MiningField name="y" usageType="target"/>
                        <MiningField name="x1"/>
                        <MiningField name="x2"/>
                        <MiningField name="x3"/>
                        <MiningField name="x4"/>
                        <MiningField name="x5"/>
                        <MiningField name="x6"/>
                        <MiningField name="x7"/>
                        <MiningField name="x8"/>
                        <MiningField name="x9"/>
                        <MiningField name="x10"/>
                        <MiningField name="x11"/>
                        <MiningField name="x12"/>
                        <MiningField name="x13"/>
                        <MiningField name="x14"/>
                        <MiningField name="x15"/>
                        <MiningField name="x16"/>
                        <MiningField name="x17"/>
                        <MiningField name="x18"/>
                        <MiningField name="x19"/>
                        <MiningField name="x20"/>
                        <MiningField name="x21"/>
                    </MiningSchema>
                    <Segmentation multipleModelMethod="selectFirst">
                    </Segmentation>
                </MiningModel>
            </Segment>
            <Segment id='1h'>
                <True/>
            </Segment>
            <Segment id='6h'>
                <True/>
            </Segment>
            <Segment id='12h'>
                <True/>
            </Segment>
            <Segment id='1d'>
                <True/>
            </Segment>
        </Segmentation>
    </MiningModel>
</PMML>

but when i load model it throw another exception:

org.jpmml.evaluator.DuplicateFieldException: Field "y" has already been defined
at org.jpmml.evaluator.visitors.AbstractParser.resolveTargetDataType(AbstractParser.java:108)
at org.jpmml.evaluator.visitors.TargetCategoryParser.processModel(TargetCategoryParser.java:313)
at org.jpmml.evaluator.visitors.TargetCategoryParser.processMiningModel(TargetCategoryParser.java:289)
at org.jpmml.evaluator.visitors.TargetCategoryParser.pushParent(TargetCategoryParser.java:90)
at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:338)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:90)
at org.dmg.pmml.mining.Segment.accept(Segment.java:235)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.mining.Segmentation.accept(Segmentation.java:185)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:69)
at org.dmg.pmml.mining.MiningModel.accept(MiningModel.java:349)
at org.dmg.pmml.PMMLObject.traverse(PMMLObject.java:108)
at org.dmg.pmml.PMML.accept(PMML.java:240)
at org.jpmml.model.visitors.AbstractVisitor.applyTo(AbstractVisitor.java:320)
at org.jpmml.model.visitors.VisitorBattery.applyTo(VisitorBattery.java:26)
at org.jpmml.evaluator.LoadingModelEvaluatorBuilder.load(LoadingModelEvaluatorBuilder.java:121)
at org.jpmml.evaluator.LoadingModelEvaluatorBuilder.load(LoadingModelEvaluatorBuilder.java:83)

We aggregated again and all target named as 'y'

In your /PMML/DataDictionary element you should have exactly one declaration of the <DataField name="y" optype="continuous" dataType="double"/> element, not five.

The idea is that all five child models will be referencing this one target field declaration.

We aggregated again and all target named as 'y'

In your /PMML/DataDictionary element you should have exactly one declaration of the <DataField name="y" optype="continuous" dataType="double"/> element, not five.

The idea is that all five child models will be referencing this one target field declaration.

Sorry we don't know how to reference five models in one fields. Could you please show us the docs or usage about this idea?

Sorry we don't know how to reference five models in one fields.

The referencing takes place automatically. Right now, the JPMML-Evaluator sees five identical y declarations, and cannot figure out which is the correct one.

You can't have five identical variable declarations in your Java/Scala code block:

double y;
// THIS IS NOT ALLOWED - the compiler will complain about a duplicate local variable declaration.
double y;

Could you please show us the docs or usage about this idea?

Simply keep the first DataField@name="y" declaration, and remove the other four declarations.

Sorry we don't know how to reference five models in one fields.

The referencing takes place automatically. Right now, the JPMML-Evaluator sees five identical y declarations, and cannot figure out which is the correct one.

You can't have five identical variable declarations in your Java/Scala code block:

double y;
// THIS IS NOT ALLOWED - the compiler will complain about a duplicate local variable declaration.
double y;

Could you please show us the docs or usage about this idea?

Simply keep the first DataField@name="y" declaration, and remove the other four declarations.

Thanks, we solve this problem and aggregated all target in y.