tensorflow/tensorrt

Object Detection example with TRT7 and TF2.1 issues

mankeyboy opened this issue · 10 comments

I'm creating this issue to help collect the issues in the Object Detection example:

To start, I have followed the steps and setup the dependencies. Now, attempting to run a synthetic test :

python object_detection.py  --input_saved_model_dir models/ssd_inception_v2_coco_2018_01_28/saved_model --output_saved_model_dir trt_engine --data_dir .  --input_size 640 --batch_size 1 --use_synthetic  --use_trt --precision FP16 --mode benchmark --num_iterations 100

Gives this error:

Benchmark arguments:
  annotation_path: None
  batch_size: 1
  calib_data_dir: None
  data_dir: .
  display_every: 100
  gpu_mem_cap: 0
  input_saved_model_dir: models/ssd_inception_v2_coco_2018_01_28/saved_model
  input_size: 640
  max_workspace_size: 1073741824
  minimum_segment_size: 2
  mode: benchmark
  num_calib_inputs: 500
  num_iterations: 100
  num_warmup_iterations: 50
  optimize_offline: False
  output_saved_model_dir: trt_engine
  precision: FP16
  target_duration: None
  use_synthetic: True
  use_trt: True
TensorRT Conversion Params:
  is_dynamic_op: True
  max_batch_size: 1
  max_workspace_size_bytes: 1073741824
  maximum_cached_engines: 1
  minimum_segment_size: 2
  precision_mode: FP16
  rewriter_config_template: None
  use_calibration: False
Conversion times:
  conversion: 49.2s
Traceback (most recent call last):
  File "object_detection.py", line 432, in <module>
    target_duration=args.target_duration)
  File "object_detection.py", line 179, in run_inference
    for i, batch_images in enumerate(dataset):
TypeError: 'NoneType' object is not iterable

On attempting to run a validation test:

python object_detection.py  --input_saved_model_dir models/ssd_inception_v2_coco_2018_01_28/saved_model --output_saved_model_dir trt_engine --data_dir coco/val2017  --annotation_path coco/annotations/instances_val2017.json --input_size 640 --batch_size 1  --use_trt --precision FP16

This error is observed:

Benchmark arguments:
  annotation_path: coco/annotations/instances_val2017.json
  batch_size: 1
  calib_data_dir: None
  data_dir: coco/val2017
  display_every: 100
  gpu_mem_cap: 0
  input_saved_model_dir: models/ssd_inception_v2_coco_2018_01_28/saved_model
  input_size: 640
  max_workspace_size: 1073741824
  minimum_segment_size: 2
  mode: validation
  num_calib_inputs: 500
  num_iterations: 2048
  num_warmup_iterations: 50
  optimize_offline: False
  output_saved_model_dir: trt_engine
  precision: FP16
  target_duration: None
  use_synthetic: False
  use_trt: True
TensorRT Conversion Params:
  is_dynamic_op: True
  max_batch_size: 1
  max_workspace_size_bytes: 1073741824
  maximum_cached_engines: 1
  minimum_segment_size: 2
  precision_mode: FP16
  rewriter_config_template: None
  use_calibration: False
Conversion times:
  conversion: 49.5s
loading annotations into memory...
Done (t=0.80s)
creating index...
index created!
2020-01-24 05:48:35.804643: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Preprocessor/map/while/ResizeImage/TRTEngineOp_293 with input shapes: [[1,640,640,3]]
2020-01-24 05:48:35.804722: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2020-01-24 05:48:35.805518: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
2020-01-24 05:48:37.953129: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:48:37.953524: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_0 with input shapes: [[1,300,300,3]]
2020-01-24 05:49:16.079274: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.081927: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_292 with input shapes: [[1,1083,91], [1,600,91], [1,150,91], [1,54,91], [1,24,91], [1,6,91]]
2020-01-24 05:49:16.085025: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_19 with input shapes: [[6,2]]
2020-01-24 05:49:16.135179: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.135250: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_20 with input shapes: [[6,2]]
2020-01-24 05:49:16.156066: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.156228: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_294 with input shapes: [[1,1083,1,4], [1,600,1,4], [1,150,1,4], [1,54,1,4], [1,24,1,4], [1,6,1,4]]
2020-01-24 05:49:16.169885: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.169962: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_9 with input shapes: [[1083,2]]
2020-01-24 05:49:16.170840: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_8 with input shapes: [[6,2], [6,2]]
2020-01-24 05:49:16.226191: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.231928: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.238409: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.238481: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_10 with input shapes: [[1083,2]]
2020-01-24 05:49:16.263111: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.263168: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_11 with input shapes: [[600,2]]
2020-01-24 05:49:16.263210: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_3 with input shapes: [[1083,2], [1083,2]]
2020-01-24 05:49:16.286966: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.294341: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.294402: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_12 with input shapes: [[600,2]]
2020-01-24 05:49:16.318996: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.319054: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_13 with input shapes: [[150,2]]
2020-01-24 05:49:16.319084: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_4 with input shapes: [[600,2], [600,2]]
2020-01-24 05:49:16.342890: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.349788: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.349848: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_14 with input shapes: [[150,2]]
2020-01-24 05:49:16.374470: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.374529: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_15 with input shapes: [[54,2]]
2020-01-24 05:49:16.374554: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_5 with input shapes: [[150,2], [150,2]]
2020-01-24 05:49:16.398877: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.406253: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.406313: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_16 with input shapes: [[54,2]]
2020-01-24 05:49:16.431354: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.431413: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_17 with input shapes: [[24,2]]
2020-01-24 05:49:16.431439: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_6 with input shapes: [[54,2], [54,2]]
2020-01-24 05:49:16.454656: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.463058: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.463119: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_18 with input shapes: [[24,2]]
2020-01-24 05:49:16.487814: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.487886: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/MultipleGridAnchorGenerator/TRTEngineOp_7 with input shapes: [[24,2], [24,2]]
2020-01-24 05:49:16.502006: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.502190: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_2 with input shapes: [[1917], [1917], [1917], [1917]]
2020-01-24 05:49:16.610433: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.610520: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/TRTEngineOp_1 with input shapes: [[1917], [1917], [1917], [1917]]
2020-01-24 05:49:16.718736: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.718886: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/TRTEngineOp_291 with input shapes: [[1,1917,4]]
2020-01-24 05:49:16.737215: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.741486: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_171 with input shapes: [[1917,1]]
2020-01-24 05:49:16.764692: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.764778: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_174 with input shapes: [[1917,1]]
2020-01-24 05:49:16.779482: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_63/TRTEngineOp_81 with input shapes: [[0,4]]
2020-01-24 05:49:16.789812: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Parameter check failed at: ../builder/builder.cpp::setMaxBatchSize::135, condition: batchSize > 0 && batchSize <= MAX_BATCH_SIZE
2020-01-24 05:49:16.803796: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.803877: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_177 with input shapes: [[1917,1]]
2020-01-24 05:49:16.805700: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_66/TRTEngineOp_84 with input shapes: [[14,4]]
2020-01-24 05:49:16.842692: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] DefaultLogger Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
2020-01-24 05:49:16.842728: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger Parameter check failed at: engine.cpp::enqueue::292, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 0, but engine max batch size was: 1
2020-01-24 05:49:16.842741: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:635] Failed to enqueue batch for TRT engine: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_63/TRTEngineOp_81
2020-01-24 05:49:16.842752: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:506] Failed to execute engine, retrying with native segment for StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_63/TRTEngineOp_81
2020-01-24 05:49:16.843134: F tensorflow/core/framework/op_kernel.cc:875] Check failed: mutable_output(index) == nullptr (0x7ff5cc03d7c0 vs. nullptr)
Aborted

This is the exact same issue I am seeing here tensorflow/tensorflow#33184 (comment)

Looks like this is more widespread than just me. Hopefully this means it will get more attention.

I met similar error. My test env is "tf2.0-trt7.0"

Benchmark arguments:
annotation_path: None
batch_size: 1
calib_data_dir: None
data_dir: .
display_every: 100
gpu_mem_cap: 0
input_saved_model_dir: /home/suhyung/work/git/tf_trt_models/examples/detection/data/faster_rcnn_resnet50_coco_2018_01_28/saved_model/
input_size: 640
max_workspace_size: 1073741824
minimum_segment_size: 2
mode: benchmark
num_calib_inputs: 500
num_iterations: 100
num_warmup_iterations: 50
optimize_offline: False
output_saved_model_dir: trt_engine
precision: FP16
target_duration: None
use_synthetic: True
use_trt: True
TensorRT Conversion Params:
is_dynamic_op: True
max_batch_size: 1
max_workspace_size_bytes: 1073741824
maximum_cached_engines: 1
minimum_segment_size: 2
precision_mode: FP16
rewriter_config_template: None
use_calibration: False
Conversion times:
conversion: 49.2s
Traceback (most recent call last):
File "object_detection.py", line 432, in
target_duration=args.target_duration)
File "object_detection.py", line 160, in run_inference
input_size=input_size)
TypeError: cannot unpack non-iterable NoneType object

I've been able to clear a few of the above errors and now I'm able to get it working for even batch sizes using the models from the r1.14+ branch of the code. However, the output I'm getting doesn't give the correct accuracy and the logs tell that because the saved_model.pb in the model
Eg:

'ssd_inception_v2_coco':
    Model(
        'ssd_inception_v2_coco',
        'http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2018_01_28.tar.gz',
        'ssd_inception_v2_coco_2018_01_28',
    )

doesn't have variables saved in the variable folder and so I'm basically running an untrained graph.
This function controls how the saved model is loaded onto the graph. The pretrained model has a checkpoint file and and a frozen_inference_graph but TensorRT takes only SavedModel in TF2.x so the only way is to load the checkpoint file or frozen_inference_graph and convert it into a SavedModel.
First, I tried this modification to the function to get to a SavedModel from the checkpoint:

 with tf.compat.v1.Session() as sess:
      new_saver = tf.compat.v1.train.import_meta_graph(saved_model_dir+'/model.ckpt.meta')
      new_saver.restore(sess, tf.train.latest_checkpoint(saved_model_dir+'/'))
      graph_func = tf.compat.v1.graph_util.convert_variables_to_constants(
            sess,
            tf.compat.v1.get_default_graph().as_graph_def(),
            output_node_names=['detection_boxes', 'detection_classes', 'detection_scores', 'num_detections'])
      tf.compat.v1.saved_model.simple_save(sess, saved_model_dir+'/test', 
                            inputs = {'image_tensor': image_tensor}
                            outputs={'detection_boxes': detection_boxes, 'detection_classes': detection_classes, 'detection_scores': detection_scores, 'num_detections': num_detections})

The code fails on this call because of errors in the placeholders and input_names and stackoverflow answers say that I need to have access to the original function that created this checkpoint to convert it.

Hence, the next approach, converting from frozen_inference_graph.pb:

INPUT_NAME = 'image_tensor'
BOXES_NAME = 'detection_boxes'
CLASSES_NAME = 'detection_classes'
SCORES_NAME = 'detection_scores'
NUM_DETECTIONS_NAME = 'num_detections'
FROZEN_GRAPH_NAME = 'frozen_inference_graph.pb'

def get_func_from_saved_model(saved_model_dir):

  builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(saved_model_dir+'/test')
  frozen_graph_path = os.path.join(saved_model_dir, FROZEN_GRAPH_NAME)
  print(frozen_graph_path)
  graph_func = tf.compat.v1.GraphDef()
  with open(frozen_graph_path, 'rb') as f:
    graph_func.ParseFromString(f.read())
  
  sigs = {}
  with tf.compat.v1.Session(graph=tf.compat.v1.Graph()) as sess:
    # name="" is important to ensure we don't get spurious prefixing
    tf.compat.v1.import_graph_def(graph_func, name="")
    tf_graph = tf.compat.v1.get_default_graph()
    tf_input = tf_graph.get_tensor_by_name(INPUT_NAME+':0')
    tf_boxes = tf_graph.get_tensor_by_name(BOXES_NAME + ':0')
    tf_classes = tf_graph.get_tensor_by_name(CLASSES_NAME + ':0')
    tf_scores = tf_graph.get_tensor_by_name(SCORES_NAME + ':0')
    tf_num_detections = tf_graph.get_tensor_by_name(NUM_DETECTIONS_NAME + ':0')

    sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = \
        tf.compat.v1.saved_model.signature_def_utils.predict_signature_def(
            {INPUT_NAME: tf_input}, {BOXES_NAME: tf_boxes, CLASSES_NAME: tf_classes, SCORES_NAME: tf_scores, NUM_DETECTIONS_NAME: tf_num_detections})

    builder.add_meta_graph_and_variables(sess,
                                         [tag_constants.SERVING],
                                         signature_def_map=sigs)
  builder.save()
  saved_model_loaded = tf.saved_model.load(
      saved_model_dir+'/test', tags=[tag_constants.SERVING])
  graph_func = saved_model_loaded.signatures[
      signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
  return graph_func 

This works but it doesn't create any variables folder and so the saved_model is completely untrained which doesn't serve the purpose. I'm getting throughput numbers but my mAP value is showing that this is an untrained graph run.

My run call is: python object_detection.py --input_saved_model_dir models/ssd_inception_v2_coco_2018_01_28 --output_saved_model_dir trt_engine --data_dir coco/val2017 --annotation_path coco/annotations/instances_val2017.json --input_size 640 --batch_size 8 --num_warmup_iterations 10 --minimum_segment_size 3 --num_iterations 50 --use_trt --precision FP16

@pooyadavoodi @vinhngx @aaroey Any tips? I'm looking for a way to get the trained model loaded properly like it was for r1.14+

@tfeher could you help to take a look at this?
Also @bixia1

I updated tensorflow/tensorflow#36724 with new comments for the bug I have raised there.

I am having different issue for the command : python object_detection.py --input_saved_model_dir $HOME/trt/obj_models/ssd_mobilenet_v2_coco_2018_03_29/saved_model/ --output_saved_model_dir $HOME/trt/obj_out_dir --optimize_offline --data_dir $HOME/trt/coco_data/val2017 --annotation_path $HOME/trt/coco_data/annotations/instances_val2017.json --batch_size 1 --use_trt --mode benchmark --precision FP32 --input_size 640
CUDA 10.2
CUDNN 7.6.5
TRT - 7
TF : master (after 2.1.0)

*Traceback (most recent call last):
File "object_detection.py", line 410, in
optimize_offline=args.optimize_offline)
File "object_detection.py", line 121, in get_graph_func
converter.build(input_fn=partial(input_fn, data_dir, 1))
File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1116, in build
self._converted_func(map(ops.convert_to_tensor, inp))
File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1600, in call
return self._call_impl(args, kwargs)
File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1640, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1741, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "/home/vinod/nvidia/p3_env/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incorrect batch dimension, for Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86: [[0,4]]
[[node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86 (defined at object_detection.py:118) ]]
[[Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/range/_68]]
(1) Invalid argument: Incorrect batch dimension, for Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86: [[0,4]]
[[node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/ClipToWindow_84/TRTEngineOp_86 (defined at object_detection.py:118) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_pruned_66916]

@aaroey : is this known issue? I am facing the segmentation issue even in latest Tensorflow container from NGC 20.01-tf1-py3

@vdevaram You can solve your issue by providing this argument --minimum_segment_size 3 when you make your run. I have already opened a bug related to this at tensorflow. The default segment size used by TensorRT for optimisations is 3 and in the code, we are trying to use 2, which even though suboptimal according to recommendations shouldn't fail. Discussion over this is ongoing on the other issue :)

  • As @mankeyboy said, using minimum_segment_size=3 or larger should help to get around the conversion problem.
  • The segfault which is present in NGC 20.01-tf2-py3 is not there in the latest TF master.

Now I moved to nvcr.io/nvidia/tensorflow:20.02-tf2-py3 and tried with TF object detection models. Here the result for Frcnn. Although it is working, I am seeing lot of latency variation with thermal rise upto 85C. is there any other problem?

cmd : python object_detection.py --input_saved_model_dir /local/obj_models/faster_rcnn_resnet50_coco_2018_01_28/saved_model/ --output_saved_model_dir /local/obj_out_dir --optimize_offline --data_dir /local/coco_data/val2017 --annotation_path /local/coco_data/annotations/instances_val2017.json --batch_size 1 --use_trt --mode benchmark --precision FP32 --input_size 600 --minimum_segment_size 3

benchmark result :
step 101/2048, iter_time(ms)=86
step 201/2048, iter_time(ms)=93
step 301/2048, iter_time(ms)=90
step 401/2048, iter_time(ms)=89
step 501/2048, iter_time(ms)=90
step 601/2048, iter_time(ms)=85
step 701/2048, iter_time(ms)=91
step 801/2048, iter_time(ms)=87
step 901/2048, iter_time(ms)=91
step 1001/2048, iter_time(ms)=92
step 1101/2048, iter_time(ms)=87
step 1201/2048, iter_time(ms)=89
step 1301/2048, iter_time(ms)=100
step 1401/2048, iter_time(ms)=106
step 1501/2048, iter_time(ms)=125
step 1601/2048, iter_time(ms)=204
step 1701/2048, iter_time(ms)=111
step 1801/2048, iter_time(ms)=118
step 1901/2048, iter_time(ms)=108
step 2001/2048, iter_time(ms)=255
Results:
images/sec: 9
99th_percentile(ms): 378.12
total_time(s): 225.9
latency_mean(ms): 115.90
latency_median(ms): 92.96
latency_min(ms): 77.35