tensorflow/tensorrt

Converting VGG19 model to TensorRT Engine : Shape Error in nvinfer1::rt::anonymous-namespace'::executeReshape: reshape would change volume

sandeepganage opened this issue · 1 comments

I ran following command to convert tf model to onnx:
python -m tf2onnx.convert --saved-model "saved-model-path/" --output "path-output-model.onnx" --opset 12 --verbose

============================================================

It gave me following output:
12:27:27,777 - INFO - tf2onnx.tfonnx: Using tensorflow=2.1.0, onnx=1.7.0, tf2onnx=1.7.1/796841
12:27:27,777 - INFO - tf2onnx.tfonnx: Using opset <onnx, 12>
12:28:20,327 - INFO - tf2onnx.tf_utils: Computed 0 values for constant folding
12:31:49,238 - VERBOSE - tf2onnx.tfonnx: Mapping TF node to ONNX node(s)
12:31:49,318 - VERBOSE - tf2onnx.tfonnx: Summay Stats:
tensorflow ops: Counter({'Const': 136, 'Conv2D': 21, 'BiasAdd': 21, 'Pad': 20, 'Relu': 16, 'Transpose': 9, 'Shape': 5, 'StridedSlice': 5, 'Identity': 4, 'MaxPool': 4, 'Mul': 4, 'ResizeNearestNeighbor': 4, 'NoOp': 2, 'Placeholder': 1, 'Pack': 1, 'Reshape': 1, 'Max': 1, 'Sub': 1, 'Exp': 1, 'Sum': 1, 'RealDiv': 1})
tensorflow attr: Counter({'dtype': 137, 'value': 136, 'T': 120, 'data_format': 46, 'strides': 25, 'padding': 25, 'dilations': 21, 'use_cudnn_on_gpu': 21, 'explicit_paddings': 21, 'Tpaddings': 20, 'Tperm': 9, 'out_type': 5, 'Index': 5, 'shrink_axis_mask': 5, 'ellipsis_mask': 5, 'begin_mask': 5, 'new_axis_mask': 5, 'end_mask': 5, 'ksize': 4, 'align_corners': 4, 'half_pixel_centers': 4, 'keep_dims': 2, 'Tidx': 2, 'shape': 1, 'axis': 1, 'N': 1, 'Tshape': 1})
onnx mapped: Counter({'Const': 74, 'BiasAdd': 20, 'Relu': 16, 'Transpose': 9, 'Shape': 5, 'StridedSlice': 5, 'Identity': 4, 'MaxPool': 4, 'Mul': 4, 'ResizeNearestNeighbor': 4, 'Placeholder': 1, 'Conv2D': 1, 'Pack': 1, 'Reshape': 1, 'Max': 1, 'Sub': 1, 'Exp': 1, 'Sum': 1, 'RealDiv': 1})
onnx unmapped: Counter()
12:31:49,321 - INFO - tf2onnx.optimizer: Optimizing ONNX model
12:31:49,322 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
12:31:49,418 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: Const -46 (125->79), Reshape -20 (21->1), Transpose -16 (17->1)
12:31:49,419 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
12:31:49,439 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
12:31:49,439 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
12:31:49,460 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: Unsqueeze -2 (3->1)
12:31:49,461 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
12:31:49,479 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
12:31:49,479 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
12:31:49,540 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: Const -29 (79->50)
12:31:49,541 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
12:31:49,816 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: Identity -5 (5->0)
12:31:49,817 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
12:31:49,840 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: Squeeze -1 (1->0), Unsqueeze -1 (1->0)
12:31:49,840 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
12:31:49,864 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: no change
12:31:49,864 - VERBOSE - tf2onnx.optimizer: Apply remove_redundant_upsample
12:31:49,881 - VERBOSE - tf2onnx.optimizer.UpsampleOptimizer: no change
12:31:49,881 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
12:31:49,898 - VERBOSE - tf2onnx.optimizer.ConstFoldOptimizer: no change
12:31:49,898 - VERBOSE - tf2onnx.optimizer: Apply loop_optimizer
12:31:49,915 - VERBOSE - tf2onnx.optimizer.LoopOptimizer: no change
12:31:49,915 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
12:31:49,932 - VERBOSE - tf2onnx.optimizer.MergeDuplicatedNodesOptimizer: no change
12:31:49,932 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
12:31:49,949 - VERBOSE - tf2onnx.optimizer.IdentityOptimizer: no change
12:31:49,949 - VERBOSE - tf2onnx.optimizer: Apply remove_back_to_back
12:31:49,966 - VERBOSE - tf2onnx.optimizer.BackToBackOptimizer: no change
12:31:49,968 - INFO - tf2onnx.optimizer: After optimization: Const -75 (125->50), Identity -5 (5->0), Reshape -20 (21->1), Squeeze -1 (1->0), Transpose -16 (17->1), Unsqueeze -3 (3->0)
12:31:49,980 - INFO - tf2onnx:
12:31:49,980 - INFO - tf2onnx: Successfully converted TensorFlow model D://TensorRt//artefact_model_10x//model//pb_model// to ONNX
12:31:50,775 - INFO - tf2onnx: ONNX model is saved at D://TensorRt//artefact_model_10x//model//ONNX_Model//model_opset12.onnx

============================================================

Everything looked good until I ran the trtexec to generate tensorRT engine:
[12:15:24] [I] Engine built in 21.4258 sec.
[12:15:24] [V] [TRT] Allocated persistent device memory of size 1070592
[12:15:24] [V] [TRT] Allocated activation device memory of size 10633728
[12:15:24] [V] [TRT] Assigning persistent memory blocks for various profiles
[12:15:24] [E] [TRT] StatefulPartitionedCall/sequential_1/reshape_1/Reshape: reshaping failed for tensor: StatefulPartitionedCall/sequential_1/conv2d_41/BiasAdd:0
[12:15:24] [E] [TRT] C:\source\rtExt\shapeMachine.cpp (160) - Shape Error in nvinfer1::rt::anonymous-namespace'::executeReshape: reshape would change volume [12:15:24] [E] [TRT] Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{1 2 1 1} {1 2 262144} [12:15:24] [E] [TRT] StatefulPartitionedCall/sequential_1/reshape_1/Reshape: reshaping failed for tensor: StatefulPartitionedCall/sequential_1/conv2d_41/BiasAdd:0 [12:15:24] [E] [TRT] C:\source\rtExt\shapeMachine.cpp (160) - Shape Error in nvinfer1::rt::anonymous-namespace'::executeReshape: reshape would change volume
[12:15:24] [E] [TRT] Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{1 2 1 1} {1 2 262144}
[12:15:24] [E] [TRT] StatefulPartitionedCall/sequential_1/reshape_1/Reshape: reshaping failed for tensor: StatefulPartitionedCall/sequential_1/conv2d_41/BiasAdd:0
[12:15:24] [E] [TRT] C:\source\rtExt\shapeMachine.cpp (160) - Shape Error in nvinfer1::rt::anonymous-namespace'::executeReshape: reshape would change volume [12:15:24] [E] [TRT] Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{1 2 1 1} {1 2 262144} [12:15:24] [I] Starting inference [12:15:24] [E] [TRT] StatefulPartitionedCall/sequential_1/reshape_1/Reshape: reshaping failed for tensor: StatefulPartitionedCall/sequential_1/conv2d_41/BiasAdd:0 ..... ..... ..... [12:15:27] [E] [TRT] C:\source\rtExt\shapeMachine.cpp (160) - Shape Error in nvinfer1::rt::anonymous-namespace'::executeReshape: reshape would change volume
[12:15:27] [E] [TRT] Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{1 2 1 1} {1 2 262144}
[12:15:27] [I] Warmup completed 0 queries over 200 ms
[12:15:27] [I] Timing trace has 0 queries over 3.01794 s
[12:15:27] [I] Trace averages of 10 runs:
[12:15:27] [I] Average on 10 runs - GPU latency: 13.0521 ms - Host latency: 13.0917 ms (end to end 13.1538 ms, enqueue 12.9954 ms)
[12:15:27] [I] Average on 10 runs - GPU latency: 13.029 ms - Host latency: 13.0801 ms (end to end 13.1425 ms, enqueue 12.9916 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1875 ms - Host latency: 13.2357 ms (end to end 13.3059 ms, enqueue 13.137 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.3038 ms - Host latency: 13.3396 ms (end to end 13.402 ms, enqueue 13.2493 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1988 ms - Host latency: 13.2585 ms (end to end 13.3239 ms, enqueue 13.1516 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.0985 ms - Host latency: 13.1461 ms (end to end 13.2 ms, enqueue 13.0505 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1161 ms - Host latency: 13.1701 ms (end to end 13.2286 ms, enqueue 13.0624 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1418 ms - Host latency: 13.2038 ms (end to end 13.2672 ms, enqueue 13.0776 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1791 ms - Host latency: 13.2468 ms (end to end 13.3183 ms, enqueue 13.1162 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1656 ms - Host latency: 13.2225 ms (end to end 13.2837 ms, enqueue 13.1109 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.0417 ms - Host latency: 13.0833 ms (end to end 13.1332 ms, enqueue 12.9986 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.0505 ms - Host latency: 13.0798 ms (end to end 13.1401 ms, enqueue 13.0056 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1655 ms - Host latency: 13.2074 ms (end to end 13.2582 ms, enqueue 13.1146 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1765 ms - Host latency: 13.2072 ms (end to end 13.2761 ms, enqueue 13.1262 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1417 ms - Host latency: 13.213 ms (end to end 13.285 ms, enqueue 13.0948 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.15 ms - Host latency: 13.2029 ms (end to end 13.2796 ms, enqueue 13.0942 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1601 ms - Host latency: 13.2137 ms (end to end 13.2796 ms, enqueue 13.1094 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.0856 ms - Host latency: 13.137 ms (end to end 13.1915 ms, enqueue 13.0331 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.2231 ms - Host latency: 13.275 ms (end to end 13.3372 ms, enqueue 13.1677 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1157 ms - Host latency: 13.1676 ms (end to end 13.2279 ms, enqueue 13.0589 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1341 ms - Host latency: 13.188 ms (end to end 13.2396 ms, enqueue 13.0834 ms)
[12:15:28] [I] Average on 10 runs - GPU latency: 13.1774 ms - Host latency: 13.241 ms (end to end 13.3011 ms, enqueue 13.1206 ms)
[12:15:28] [I] Host Latency
[12:15:28] [I] min: 12.6311 ms (end to end 12.6689 ms)
[12:15:28] [I] max: 13.9139 ms (end to end 14.0032 ms)
[12:15:28] [I] mean: 13.1901 ms (end to end 13.2523 ms)
[12:15:28] [I] median: 13.183 ms (end to end 13.2383 ms)
[12:15:28] [I] percentile: 13.8271 ms at 99% (end to end 13.8812 ms at 99%)
[12:15:28] [I] throughput: 0 qps
[12:15:28] [I] walltime: 3.01794 s
[12:15:28] [I] Enqueue Time
[12:15:28] [I] min: 12.5488 ms
[12:15:28] [I] max: 13.7494 ms
[12:15:28] [I] median: 13.0797 ms
[12:15:28] [I] GPU Compute
[12:15:28] [I] min: 12.6053 ms
[12:15:28] [I] max: 13.8533 ms
[12:15:28] [I] mean: 13.1394 ms
[12:15:28] [I] median: 13.1288 ms
[12:15:28] [I] percentile: 13.7648 ms at 99%
[12:15:28] [I] total compute time: 2.96951 s
&&&& PASSED TensorRT.trtexec # C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\trtexec.exe --onnx=D:\TensorRt\artefact_model_10x\model\ONNX_Model\model.onnx --saveEngine=D:\TensorRt\artefact_model_10x\model\TRT_Model\Model_tf2ONNX_opset12.engine --explicitBatch --verbose

============================================================

This is the second tensorflow model I tried converting to TensorRT engine. But no success. First one was with Unet architecture. This one is VGG-19

This is the link to my tensorflow model. If anyone wish to attempt the conversion then feel free to do so:
https://drive.google.com/drive/folders/1NCvpEMzX8S13rNrtjcglPQ8ng2e3WKoc?usp=sharing

============================================================

tf2onnx has nothing to do with this project, please refer to: https://github.com/onnx/tensorflow-onnx