tensorflow/tensorrt

tensorrt optimized model output shape become unknow?

luvwinnie opened this issue · 2 comments

I use a trained tensorflow saved model and use the optimization with the following code.

from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverterV2(input_saved_model_dir="feat_ext")
converter.convert()
converter.save("tensorrt_model_feat_ext")

However, my optimized tensorrt saved model output has become unknown?

before optimized

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 64, -1, 3)
        name: serving_default_input_1:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['lambda'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, -1, 2048)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

after optimized

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 64, -1, 3)
        name: serving_default_input_1:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['lambda'] tensor_info:
        dtype: DT_FLOAT
        shape: unknown_rank
        name: PartitionedCall:0
  Method name is: tensorflow/serving/predict

This made my TensorRT inference server not able to loaded.
I use the following config.pbtxt for my model.

config.pbtxt

name: "feat_ext"
platform: "tensorflow_savedmodel"
max_batch_size:16
input [
{
  name:"input_1"
  data_type: TYPE_FP32
  dims: [ 64, -1, 3 ]
}
]
output [
{
  name:"lambda"
  data_type: TYPE_FP32
  dims: [-1, 2048]
}
]
instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

Errors

failed to load 'feat_ext' version 1: Internal: Can't parse /models/feat_ext/1/model.savedmodel/saved_model.pb as binary proto

TensorRT Optimization Logs

2020-03-20 22:57:10.929109: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:796] Optimization results for grappler item: graph_to_optimize
2020-03-20 22:57:10.929127: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   function_optimizer: Graph size after: 597 nodes (442), 1215 edges (1060), time = 9.1ms.
2020-03-20 22:57:10.929131: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   function_optimizer: function_optimizer did nothing. time = 0.153ms.
2020-03-20 22:57:16.884562: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.885051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.885782: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2020-03-20 22:57:16.885849: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-03-20 22:57:16.886546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.886922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 0 with properties:
pciBusID: 0000:0d:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-03-20 22:57:16.886966: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.887697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1558] Found device 1 with properties:
pciBusID: 0000:0e:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-03-20 22:57:16.887730: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-20 22:57:16.887739: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-20 22:57:16.887747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-20 22:57:16.887756: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-20 22:57:16.887763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-20 22:57:16.887771: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-20 22:57:16.887779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-20 22:57:16.887820: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.888236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.888999: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.889388: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.890116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1700] Adding visible gpu devices: 0, 1
2020-03-20 22:57:16.890144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1099] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-20 22:57:16.890149: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105]      0 1
2020-03-20 22:57:16.890154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 0:   N Y
2020-03-20 22:57:16.890158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1118] 1:   Y N
2020-03-20 22:57:16.890238: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.890639: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.891401: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.891764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22749 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:0d:00.0, compute capability: 7.5)
2020-03-20 22:57:16.891812: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-20 22:57:16.892553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1244] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22770 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:0e:00.0, compute capability: 7.5)
2020-03-20 22:57:24.476520: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 6 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2020-03-20 22:57:24.493541: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:638] Number of TensorRT candidate segments: 1
2020-03-20 22:57:25.134591: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:739] Replaced segment 0 consisting of 268 nodes by TRTEngineOp_0.
2020-03-20 22:57:31.830885: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:796] Optimization results for grappler item: tf_graph
2020-03-20 22:57:31.830930: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   constant_folding: Graph size after: 420 nodes (-152), 886 edges (-304), time = 3281.7041ms.
2020-03-20 22:57:31.830934: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   layout: Graph size after: 424 nodes (4), 890 edges (4), time = 1114.55505ms.
2020-03-20 22:57:31.830937: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   constant_folding: Graph size after: 424 nodes (0), 890 edges (0), time = 935.476ms.
2020-03-20 22:57:31.830940: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   TensorRTOptimizer: Graph size after: 157 nodes (-267), 158 edges (-732), time = 2465.43896ms.
2020-03-20 22:57:31.830943: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   constant_folding: Graph size after: 157 nodes (0), 158 edges (0), time = 13.608ms.
2020-03-20 22:57:31.830946: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:796] Optimization results for grappler item: TRTEngineOp_0_native_segment
2020-03-20 22:57:31.830950: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   constant_folding: Graph size after: 270 nodes (0), 431 edges (0), time = 1124.15295ms.
2020-03-20 22:57:31.830952: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   layout: Graph size after: 270 nodes (0), 431 edges (0), time = 1266.77405ms.
2020-03-20 22:57:31.830957: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   constant_folding: Graph size after: 270 nodes (0), 431 edges (0), time = 1085.67ms.
2020-03-20 22:57:31.830960: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   TensorRTOptimizer: Graph size after: 270 nodes (0), 431 edges (0), time = 153.51ms.
2020-03-20 22:57:31.830972: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:798]   constant_folding: Graph size after: 270 nodes (0), 431 edges (0), time = 1132.224ms.
2020-03-20 22:57:43.577481: W tensorflow/core/framework/op_kernel.cc:1719] OP_REQUIRES failed at trt_engine_resource_ops.cc:183 : Not found: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_0)
WARNING: Logging before flag parsing goes to stderr.
W0320 22:57:44.060742 140478260471616 deprecation.py:506] From /home/usr16/cheesiang_leow/anaconda3/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1809: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Does anyone know how to solve the problem?

@luvwinnie
I have the same problem. Do you have any idea to solve it?

@luckycallor sorry I haven't solve this problem yet. So i didn't use tensorRT currently