microsoft/onnxconverter-common

convert_float_to_float16() produces a model that causes ValidationError with onnx.checker.check_model()

SergeySandler opened this issue · 6 comments

With ONNX 1.13.1, a fp32 model passes onnx.checker.check_model() without warnings or errors,

import onnx
onnx_model = onnx.load("/models/ResNet50.onnx")
onnx.checker.check_model(onnx_model)

but when converted into fp16 onnx.checker.check_model()

from onnxconverter_common import float16
onnx_model_fp16 = float16.convert_float_to_float16(onnx_model, keep_io_types = True)
import warnings
warnings.filterwarnings("ignore")
onnx.checker.check_model(onnx_model_fp16)

triggers ValidationError

ValidationError: Nodes in a graph must be topologically sorted, however input 'graph_input_cast_0' of node: name: StatefulPartitionedCall/resnet50/conv1_conv/Conv2D__6 OpType: Transpose is not output of any previous nodes.

The ResNet50.onnx model is attached (as a multiple disk archive due to maximum size restriction, rename ResNet50.z01.zip into ResNet50.z01, ResNet50.z02.zip into ResNet50.z02, ResNet50.z03.zip into ResNet50.z03).

There is a separate issue microsoft/onnxruntime#15494 about onnxruntime catastrophic failure when attempting to load the fp16 model that does not pass validation.

ResNet50.zip
ResNet50.z01.zip
ResNet50.z02.zip
ResNet50.z03.zip

I'm facing the same issue. Are there any updates on this?

Here's a Colab notebook that reproduces the issue.

Suggest to skip check_model to have a try. Sometime check_model cannot pass, but inference works well.
The check_model() function belongs to ONNX/ONNX code.

@bilalsoomro @SergeySandler Hope that helps - Xiaowu wrote the relevant packages you are importing :)

I'm also facing the same issue while using convert_float_to_float16(onnx_model, keep_io_types = True)

Suggest to skip check_model to have a try. Sometime check_model cannot pass, but inference works well. The check_model() function belongs to ONNX/ONNX code.

Hi @xiaowuhu, I tried to perform inference however I get the following error.

Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from /content/resnet50_fp16.onnx failed:/onnxruntime_src/onnxruntime/core/graph/graph.cc:1274 onnxruntime::Graph::Graph(const onnxruntime::Model&, onnx::GraphProto*, const std::unordered_map<std::basic_string, int>&, onnxruntime::Version, onnxruntime::IOnnxRuntimeOpSchemaCollectionPtr, onnxruntime::Graph*, const onnxruntime::Node*, const onnxruntime::logging::Logger&, bool) [ONNXRuntimeError] : 1 : FAIL : Tensor element type mismatch. 10 != 1

Did you get a chance to try this reproducible example? Colab link

I'm also facing the same issue while using convert_float_to_float16(onnx_model, keep_io_types = True)

so do I. remove keep_io_types = True will be fine