Quantco/spox

Missing reference implementation aborts graph definition process

Closed this issue · 3 comments

The following test case:

def test_error_from_ref_impl():
    import spox.opset.ai.onnx.v17 as op
    import spox.opset.ai.onnx.ml.v3 as ml
    ml.label_encoder(
        op.const(numpy.array(["foo"])),
        keys_strings=["foo"],
        values_int64s=[42],
        default_int64=-1,
    )

raises the following error:

_____________________________________ test_error_from_ref_impl _____________________________________

    def test_error_from_ref_impl():
        import spox.opset.ai.onnx.v17 as op
        import spox.opset.ai.onnx.ml.v3 as ml
>       ml.label_encoder(
            op.const(numpy.array(["foo"])),
            keys_strings=["foo"],
            values_int64s=[42],
            default_int64=-1,
        )

tests/test_value_propagation.py:139: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/spox/opset/ai/onnx/ml/v3.py:1108: in label_encoder
    return _LabelEncoder(
src/spox/_node.py:148: in __init__
    output_values = self.propagate_values() if propagate_values else {}
src/spox/_standard.py:216: in propagate_values
    return self.propagate_values_onnx()
src/spox/_standard.py:202: in propagate_values_onnx
    output_feed = run(model, input_feed)
src/spox/_value_prop.py:135: in _run_reference_implementation
    session = onnx.reference.ReferenceEvaluator(model)
../../../micromamba/envs/quantcore-learn/lib/python3.10/site-packages/onnx/reference/reference_evaluator.py:227: in __init__
    self._init()
../../../micromamba/envs/quantcore-learn/lib/python3.10/site-packages/onnx/reference/reference_evaluator.py:312: in _init
    cl = self._load_impl(node)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <onnx.reference.reference_evaluator.ReferenceEvaluator object at 0x11c682bf0>
node = input: "X"
output: "Y"
name: "_this_"
op_type: "LabelEncoder"
attribute {
  name: "default_float"
  f: -0.0
  type: FL...
  strings: "foo"
  type: STRINGS
}
attribute {
  name: "values_int64s"
  ints: 42
  type: INTS
}
domain: "ai.onnx.ml"

input_types = None

    def _load_impl(
        self, node: NodeProto, input_types: Optional[TypeProto] = None
    ) -> Any:
        """
        Loads the implementation for a specified runtime.
        """
        if node.domain not in self.opsets:
            raise RuntimeError(
                f"Domain {node.domain!r} (node type: {node.op_type!r}) "
                f"is not specified. Known opsets: {self.opsets!r}."
            )
        version = self.opsets[node.domain]
        key = node.domain, node.op_type
        if key in self.new_ops_:
            # This operator has a custom implementation.
            # This mechanism can be used to implement a custom onnx node
            # or to overwrite an existing one.
            return self.new_ops_[key]
    
        if node.domain == "":
            from .ops import load_op
    
            try:
                return load_op(node.domain, node.op_type, version)
            except RuntimeContextError:
                if input_types is None:
                    raise
                return load_op(
                    node.domain,
                    node.op_type,
                    version,
                    node=node,
                    input_types=input_types,
                )
    
        if node.domain == "ai.onnx.preview.training":
            from .ops.aionnx_preview_training import load_op as load_op_pt
    
            return load_op_pt(node.domain, node.op_type, version)
    
        if node.domain == "experimental":
            from .ops.experimental import load_op as load_op_exp
    
            return load_op_exp(node.domain, node.op_type, version)
    
        # It has to be a function.
        if key in self.functions_:
            from .ops import load_op
    
            impl = self.functions_[key]
            return load_op(node.domain, node.op_type, version, custom=impl)
>       raise NotImplementedError(
            f"Node type {node.op_type!r} from domain {node.domain!r} "
            f"is unknown, known functions: {list(sorted(self.functions_))}."
        )
E       NotImplementedError: Node type 'LabelEncoder' from domain 'ai.onnx.ml' is unknown, known functions: [].

../../../micromamba/envs/quantcore-learn/lib/python3.10/site-packages/onnx/reference/reference_evaluator.py:379: NotImplementedError

Setting spox._standard._VALUE_PROP_BACKEND = spox._standard.ValuePropBackend.NONE fixes the error for now.

Instead of raising an error we should gracefully give up on the value propagation.

This should be a somewhat simple fix.

I'm not sure if what I have in mind isn't too short-sighted, though. I think the easiest way would be to except NotImplementedError: return {} in _run_reference_implementation. It will fail silently, but value propagation isn't really meant to be reliable as it's still not in the public interface. It is rather unfortunate that we would want it to throw errors when we want it to, though.

Possible alternatives: should this always fail silently? Or raise a warning? Catch all exceptions possible? Can we enumerate more useful ones (well, more likely for ONNX Runtime)? Catch all errors by default and propagate when a flag is used?

Catch all errors by default and propagate when a flag is used?

This is my preference. The value propagation works on a best-effort principle and if it fails it just is what it is for now. It should not interrupt or even warn the average user. The only way a user could address the warning at this point is by setting private variables which is not something that we want to encourage.

Closed by Quantco/spox-archive#74