tensorflow/tensorrt

TensorRT is Unable to convert the FusedBatchNormV2 op

danfischetti opened this issue · 1 comments

Description

TensorRT is unable to convert the FusedBatchNormV2 op. When I turn on verbose logging I see messages like this:

2020-03-14 17:28:54.501494: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: FusedBatchNormV2), (Op name: pose/pose/resnet_v2_50/block1/unit_1/bottleneck_v2/preact/FusedBatchNormV2), (Reason: Unimplemented: FusedBatchNormV2 only supports data_format=NCHW, at pose/pose/resnet_v2_50/block1/unit_1/bottleneck_v2/preact/FusedBatchNormV2)

I explicitly converted my graph to use the channels first data format after seeing channels_last was not supported for batch norm with TensorRT. I also confirmed that the frozen graph saves the node definitions in NCHW form. Inspecting the node referenced in the log message shows:

name: "pose/pose/resnet_v2_50/block1/unit_1/bottleneck_v2/preact/FusedBatchNormV2"
op: "FusedBatchNormV2"
input: "pose/pose/resnet_v2_50/pool1/MaxPool"
input: "pose/resnet_v2_50/block1/unit_1/bottleneck_v2/preact/gamma/read"
input: "pose/resnet_v2_50/block1/unit_1/bottleneck_v2/preact/beta/read"
input: "pose/resnet_v2_50/block1/unit_1/bottleneck_v2/preact/moving_mean/read"
input: "pose/resnet_v2_50/block1/unit_1/bottleneck_v2/preact/moving_variance/read"
attr {
  key: "T"
  value {
    type: DT_HALF
  }
}
attr {
  key: "U"
  value {
    type: DT_FLOAT
  }
}
attr {
  key: "data_format"
  value {
    s: "NCHW"
  }
}
attr {
  key: "epsilon"
  value {
    f: 1.001e-05
  }
}
attr {
  key: "is_training"
  value {
    b: false
  }
}

So the data_format is correct and is_training is false, so I would expect TensorRT to be able to convert this node.

Environment

TensorRT Version: 6.0.1.10
GPU Type: Jetson Xavier
Nvidia Driver Version:
CUDA Version: 10.0
CUDNN Version: 7.6.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version: 1.15
Baremetal or Container (if container which image + tag): Baremetal

I've managed to reproduce the issue with a minimal script and tracked down when the issue occurs. Adding a batch_norm operation inside a convolution layer is what leads to the issue. I've included a sample script to reproduce the problem.

import argparse
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import tensorflow as tf
import datetime
from tensorflow.python.compat import compat
import tensorflow.contrib.slim as slim


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--with_conv", action="store_true")
    args = parser.parse_args()
    with_conv = args.with_conv
    compat._update_forward_compatibility_date_number(datetime.date(2019, 6, 6)) # force tensorflow to use FusedBatchNormV2
    with tf.Session() as sess:
        input0 = tf.placeholder(tf.float16, [10, 3, 224, 224])
        out = input0

        out = slim.batch_norm(out, data_format="NCHW", scope="bn1", is_training=False)
        out = slim.batch_norm(out, data_format="NHWC", scope="bn2", is_training=False)
        with slim.arg_scope([slim.conv2d, slim.batch_norm], data_format="NCHW"), slim.arg_scope([slim.batch_norm], is_training=False):
            if with_conv:
                out = slim.conv2d(out, 64, [3, 3], normalizer_fn=slim.batch_norm)
        out = tf.identity(out)
        out_name = out.name[:-2]

        init = tf.global_variables_initializer()
        sess.run(init)

        frozen_graph = tf.graph_util.convert_variables_to_constants(
                sess, tf.get_default_graph().as_graph_def(), output_node_names=[out_name])

        with tf.gfile.GFile(f"/home/standard/minimal{'_with_conv' if with_conv else ''}.pb", "wb") as f:
            f.write(frozen_graph.SerializeToString())

        converter = trt.TrtGraphConverter(
            input_graph_def=frozen_graph,
            nodes_blacklist=[out_name],
            max_batch_size=10, max_workspace_size_bytes=5000000000,
            maximum_cached_engines=256,
            precision_mode="FP16", is_dynamic_op=True)
        trt_graph = converter.convert()

        with tf.gfile.GFile("/home/standard/dummy.pb", "wb") as f:
            f.write(trt_graph.SerializeToString())


if __name__ == "__main__":
    main()

Running this script without the with_conv flag leads to expected results. I've included a NCHW batch norm and a NHWC batch norm to demonstrate that TensorRT has no problem changing the layout of the op.

TF_CPP_VMODULE=segment=2,convert_graph=2,convert_nodes=2,trt_engine=1,trt python retarget_minimal.py
2020-03-15 13:50:46.779963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:53.583972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-03-15 13:50:53.586643: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From retarget_minimal.py:17: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-03-15 13:50:55.431086: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-03-15 13:50:55.454122: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:55.454409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2020-03-15 13:50:55.454497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:55.454633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-15 13:50:55.459760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-15 13:50:55.461674: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-15 13:50:55.467475: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-15 13:50:55.471282: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-15 13:50:55.471531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-15 13:50:55.471969: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:55.472433: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:55.472570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-15 13:50:55.512621: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-03-15 13:50:55.514099: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x30c35940 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-15 13:50:55.514260: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-03-15 13:50:55.647628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:55.649041: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3218eaa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-03-15 13:50:55.649204: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
2020-03-15 13:50:55.650509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:55.650741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2020-03-15 13:50:55.651222: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:55.651430: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-15 13:50:55.651722: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-15 13:50:55.651853: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-15 13:50:55.651926: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-15 13:50:55.651983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-15 13:50:55.652035: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-15 13:50:55.652469: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:55.652770: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:55.652961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-15 13:50:55.653097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:58.239564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-15 13:50:58.239774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-03-15 13:50:58.239883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-03-15 13:50:58.240882: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:58.241320: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:58.241732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20048 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
WARNING:tensorflow:From retarget_minimal.py:18: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/standard/envs/video_pipeline/lib/python3.6/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:653: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From retarget_minimal.py:30: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From retarget_minimal.py:34: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From retarget_minimal.py:34: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /home/standard/envs/video_pipeline/lib/python3.6/site-packages/tensorflow_core/python/framework/graph_util_impl.py:277: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
WARNING:tensorflow:From retarget_minimal.py:36: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2020-03-15 13:51:06.308794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-03-15 13:51:06.373271: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:51:06.373642: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2020-03-15 13:51:06.374067: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-03-15 13:51:06.376474: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:51:06.376688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2020-03-15 13:51:06.376777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:51:06.376839: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-15 13:51:06.376906: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-15 13:51:06.376963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-15 13:51:06.377018: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-15 13:51:06.377072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-15 13:51:06.377136: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-15 13:51:06.377407: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:51:06.377794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:51:06.377960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-15 13:51:06.378056: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-15 13:51:06.378095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-03-15 13:51:06.378133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-03-15 13:51:06.378409: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:51:06.378794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:51:06.379040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20048 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-03-15 13:51:06.447339: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: NoOp), (Op name: _SOURCE), (Reason: Unimplemented: Op type NoOp is not supported.)
2020-03-15 13:51:06.447474: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: NoOp), (Op name: _SINK), (Reason: Unimplemented: Op type NoOp is not supported.)
2020-03-15 13:51:06.447521: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:429] Not a TF-TRT candidate, (Op type: Placeholder), (Op name: Placeholder), (Reason: excluded by segmenter option)
2020-03-15 13:51:06.447611: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/Const
2020-03-15 13:51:06.447669: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/beta/read
2020-03-15 13:51:06.447741: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/moving_mean/read
2020-03-15 13:51:06.447831: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/moving_variance/read
2020-03-15 13:51:06.448104: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: FusedBatchNormV2), (Op name: bn1/FusedBatchNormV2
2020-03-15 13:51:06.448211: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/Const
2020-03-15 13:51:06.448324: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/beta/read
2020-03-15 13:51:06.448382: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/moving_mean/read
2020-03-15 13:51:06.448431: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/moving_variance/read
2020-03-15 13:51:06.448477: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/FusedBatchNormV2-0-PermConstNHWCToNCHW-LayoutOptimizer
2020-03-15 13:51:06.448575: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Transpose), (Op name: bn2/FusedBatchNormV2-0-TransposeNHWCToNCHW-LayoutOptimizer
2020-03-15 13:51:06.448709: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: FusedBatchNormV2), (Op name: bn2/FusedBatchNormV2
2020-03-15 13:51:06.448757: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Identity), (Op name: Identity
2020-03-15 13:51:06.448823: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Identity-0-0-PermConstNCHWToNHWC-LayoutOptimizer
2020-03-15 13:51:06.448923: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Transpose), (Op name: Identity-0-0-TransposeNCHWToNHWC-LayoutOptimizer
2020-03-15 13:51:06.448962: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:429] Not a TF-TRT candidate, (Op type: Identity), (Op name: Identity_1), (Reason: excluded by segmenter option)
2020-03-15 13:51:06.449017: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 4 ops of 3 different types in the graph that are not converted to TensorRT: Identity, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2020-03-15 13:51:06.449252: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/Const has no device assigned requested device is: 
2020-03-15 13:51:06.449289: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/beta/read has no device assigned requested device is: 
2020-03-15 13:51:06.449330: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/moving_mean/read has no device assigned requested device is: 
2020-03-15 13:51:06.449362: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/moving_variance/read has no device assigned requested device is: 
2020-03-15 13:51:06.449391: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/FusedBatchNormV2 has no device assigned requested device is: 
2020-03-15 13:51:06.449424: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/Const has no device assigned requested device is: 
2020-03-15 13:51:06.449456: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/beta/read has no device assigned requested device is: 
2020-03-15 13:51:06.449487: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/moving_mean/read has no device assigned requested device is: 
2020-03-15 13:51:06.449519: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/moving_variance/read has no device assigned requested device is: 
2020-03-15 13:51:06.449560: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/FusedBatchNormV2 has no device assigned requested device is: 
2020-03-15 13:51:06.449593: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Identity has no device assigned requested device is: 
2020-03-15 13:51:06.449632: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 15
2020-03-15 13:51:06.449696: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 15
2020-03-15 13:51:06.449760: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:680] Nodes in segment 0 with parent=Identity-0-0-PermConstNCHWToNHWC-LayoutOptimizer:
[Op type: Const] bn1/Const
[Op type: Const] bn1/beta/read
[Op type: Const] bn1/moving_mean/read
[Op type: Const] bn1/moving_variance/read
[Op type: FusedBatchNormV2] bn1/FusedBatchNormV2
[Op type: Const] bn2/Const
[Op type: Const] bn2/beta/read
[Op type: Const] bn2/moving_mean/read
[Op type: Const] bn2/moving_variance/read
[Op type: Const] bn2/FusedBatchNormV2-0-PermConstNHWCToNCHW-LayoutOptimizer
[Op type: Transpose] bn2/FusedBatchNormV2-0-TransposeNHWCToNCHW-LayoutOptimizer
[Op type: FusedBatchNormV2] bn2/FusedBatchNormV2
[Op type: Identity] Identity
[Op type: Const] Identity-0-0-PermConstNCHWToNHWC-LayoutOptimizer
[Op type: Transpose] Identity-0-0-TransposeNCHWToNHWC-LayoutOptimizer
2020-03-15 13:51:06.449855: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:719] Devices Segment : 'Identity-0-0-PermConstNCHWToNHWC-LayoutOptimizer' /job:localhost/replica:0/task:0/device:GPU:0, 
2020-03-15 13:51:06.449907: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 1
2020-03-15 13:51:06.450061: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn2/Const neither have requested device nor assigned device
2020-03-15 13:51:06.450103: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn1/beta/read neither have requested device nor assigned device
2020-03-15 13:51:06.450134: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn1/moving_variance/read neither have requested device nor assigned device
2020-03-15 13:51:06.450166: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn2/beta/read neither have requested device nor assigned device
2020-03-15 13:51:06.450201: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn1/moving_mean/read neither have requested device nor assigned device
2020-03-15 13:51:06.450235: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn2/moving_mean/read neither have requested device nor assigned device
2020-03-15 13:51:06.450273: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn1/Const neither have requested device nor assigned device
2020-03-15 13:51:06.450311: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn1/FusedBatchNormV2 neither have requested device nor assigned device
2020-03-15 13:51:06.450345: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:208] Input edge = Placeholder:0
2020-03-15 13:51:06.450386: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn2/moving_variance/read neither have requested device nor assigned device
2020-03-15 13:51:06.450418: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node bn2/FusedBatchNormV2 neither have requested device nor assigned device
2020-03-15 13:51:06.450448: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node Identity neither have requested device nor assigned device
2020-03-15 13:51:06.450489: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:240] Output edge = Identity-0-0-TransposeNCHWToNHWC-LayoutOptimizer:0
2020-03-15 13:51:06.450602: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5693] Constructing input TensorRTInputPH_0 for the edge Placeholder:0 -> bn1/FusedBatchNormV2:0
2020-03-15 13:51:06.450683: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5715] Constructing output TensorRTOutputPH_0 for the edge Identity-0-0-TransposeNCHWToNHWC-LayoutOptimizer:0 -> Identity_1:0
2020-03-15 13:51:06.450739: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn2/FusedBatchNormV2-0-PermConstNHWCToNCHW-LayoutOptimizer to subgraph
2020-03-15 13:51:06.450777: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn2/Const to subgraph
2020-03-15 13:51:06.450817: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn1/beta/read to subgraph
2020-03-15 13:51:06.450865: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn1/moving_variance/read to subgraph
2020-03-15 13:51:06.450918: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn2/beta/read to subgraph
2020-03-15 13:51:06.451094: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Identity-0-0-PermConstNCHWToNHWC-LayoutOptimizer to subgraph
2020-03-15 13:51:06.451184: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn1/moving_mean/read to subgraph
2020-03-15 13:51:06.451242: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn2/moving_mean/read to subgraph
2020-03-15 13:51:06.451317: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn1/Const to subgraph
2020-03-15 13:51:06.451387: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn1/FusedBatchNormV2 to subgraph
2020-03-15 13:51:06.451426: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn2/FusedBatchNormV2-0-TransposeNHWCToNCHW-LayoutOptimizer to subgraph
2020-03-15 13:51:06.451510: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn2/moving_variance/read to subgraph
2020-03-15 13:51:06.451553: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying bn2/FusedBatchNormV2 to subgraph
2020-03-15 13:51:06.451588: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Identity to subgraph
2020-03-15 13:51:06.451659: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Identity-0-0-TransposeNCHWToNHWC-LayoutOptimizer to subgraph
2020-03-15 13:51:06.451742: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5740] Updating bn1/FusedBatchNormV2:0 from Placeholder to TensorRTInputPH_0
2020-03-15 13:51:06.451858: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:262] Converted TensorRT candidate segment 'TRTEngineOp_0' to a GraphDef
2020-03-15 13:51:06.452858: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:543] Adding funcdef TRTEngineOp_0_native_segment to graphlib
2020-03-15 13:51:06.472543: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:698] Current cuda device is 0
2020-03-15 13:51:06.472707: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:710] Assigned 5000000000 bytes to TRTEngineOp_0
2020-03-15 13:51:06.472795: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:590] Using allocator GPU_0_bfc and cuda_device_id 0
2020-03-15 13:51:06.472888: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:334] Processing TRTEngineOp_0
2020-03-15 13:51:06.472950: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:398] Engine Input Placeholder:0 -> TRTEngineOp_0:0
2020-03-15 13:51:06.473022: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:442] TRTEngineOp_0 inputs= Placeholder:0 
2020-03-15 13:51:06.474307: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:469] Adding TRTEngine TRTEngineOp_0 to graph
2020-03-15 13:51:06.474443: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:487] input_nodes size = 1
2020-03-15 13:51:06.474485: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:491] Connecting data edge from Placeholder:0 to TRTEngineOp_0:0
2020-03-15 13:51:06.474548: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:514] Updating data edge from TRTEngineOp_0:0 to Identity_1:0
2020-03-15 13:51:06.474648: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 15 nodes succeeded.
2020-03-15 13:51:06.474737: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:744] Segment consists of nodes: bn1/Const, bn1/beta/read, bn1/moving_mean/read, bn1/moving_variance/read, bn1/FusedBatchNormV2, bn2/Const, bn2/beta/read, bn2/moving_mean/read, bn2/moving_variance/read, bn2/FusedBatchNormV2-0-PermConstNHWCToNCHW-LayoutOptimizer, bn2/FusedBatchNormV2-0-TransposeNHWCToNCHW-LayoutOptimizer, bn2/FusedBatchNormV2, Identity, Identity-0-0-PermConstNCHWToNHWC-LayoutOptimizer, Identity-0-0-TransposeNCHWToNHWC-LayoutOptimizer, 
2020-03-15 13:51:06.475088: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:757] Returning from conversion
2020-03-15 13:51:06.491678: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-03-15 13:51:06.498760: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:839] Optimization results for grappler item: tf_graph
2020-03-15 13:51:06.498876: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 13 nodes (-6), 12 edges (-6), time = 4.624ms.
2020-03-15 13:51:06.498934: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   layout: Graph size after: 17 nodes (4), 16 edges (4), time = 8.217ms.
2020-03-15 13:51:06.499000: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 17 nodes (0), 16 edges (0), time = 3.929ms.
2020-03-15 13:51:06.499034: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   TensorRTOptimizer: Graph size after: 3 nodes (-14), 2 edges (-14), time = 29.631ms.
2020-03-15 13:51:06.499064: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 3 nodes (0), 2 edges (0), time = 3.784ms.
2020-03-15 13:51:06.499090: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:839] Optimization results for grappler item: TRTEngineOp_0_native_segment
2020-03-15 13:51:06.499128: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 17 nodes (0), 16 edges (0), time = 2.018ms.
2020-03-15 13:51:06.499159: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   layout: Graph size after: 17 nodes (0), 16 edges (0), time = 2.16ms.
2020-03-15 13:51:06.499194: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 17 nodes (0), 16 edges (0), time = 2.598ms.
2020-03-15 13:51:06.499225: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   TensorRTOptimizer: Graph size after: 17 nodes (0), 16 edges (0), time = 0.214ms.
2020-03-15 13:51:06.499254: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 17 nodes (0), 16 edges (0), time = 2.389ms.

Running with thewith_conv flag triggers the issue, not only is TensorRT not able to convert the FusedBatchNormV2 op inside the convolution, but the other 2 ops also now fail to convert. ex:

2020-03-15 13:50:24.967573: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: FusedBatchNormV2), (Op name: bn1/FusedBatchNormV2), (Reason: Unimplemented: FusedBatchNormV2 only supports data_format=NCHW, at bn1/FusedBatchNormV2)

TF_CPP_VMODULE=segment=2,convert_graph=2,convert_nodes=2,trt_engine=1,trt python retarget_minimal.py --with_conv
2020-03-15 13:50:05.373096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:12.779314: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-03-15 13:50:12.782559: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From retarget_minimal.py:17: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-03-15 13:50:14.631535: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-03-15 13:50:14.655909: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:14.656805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2020-03-15 13:50:14.656949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:14.657131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-15 13:50:14.661989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-15 13:50:14.664867: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-15 13:50:14.672712: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-15 13:50:14.677702: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-15 13:50:14.677961: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-15 13:50:14.678361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:14.678927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:14.680395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-15 13:50:14.726453: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-03-15 13:50:14.728235: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xf0ce120 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-15 13:50:14.728448: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-03-15 13:50:14.848609: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:14.849492: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xf75c3a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-03-15 13:50:14.849641: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Xavier, Compute Capability 7.2
2020-03-15 13:50:14.850446: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:14.850684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2020-03-15 13:50:14.850826: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:14.850921: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-15 13:50:14.851077: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-15 13:50:14.851158: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-15 13:50:14.851345: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-15 13:50:14.851453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-15 13:50:14.851542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-15 13:50:14.851753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:14.852008: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:14.852150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-15 13:50:14.852410: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:17.468948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-15 13:50:17.469228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-03-15 13:50:17.469294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-03-15 13:50:17.470084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:17.470579: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:17.471050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20053 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
WARNING:tensorflow:From retarget_minimal.py:18: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/standard/envs/video_pipeline/lib/python3.6/site-packages/tensorflow_core/contrib/layers/python/layers/layers.py:653: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
WARNING:tensorflow:From retarget_minimal.py:30: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From retarget_minimal.py:34: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From retarget_minimal.py:34: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From /home/standard/envs/video_pipeline/lib/python3.6/site-packages/tensorflow_core/python/framework/graph_util_impl.py:277: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
WARNING:tensorflow:From retarget_minimal.py:36: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2020-03-15 13:50:24.714985: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-03-15 13:50:24.846646: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:24.846893: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2020-03-15 13:50:24.847275: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-03-15 13:50:24.850350: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:24.850580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.377
pciBusID: 0000:00:00.0
2020-03-15 13:50:24.850669: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-15 13:50:24.850736: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-15 13:50:24.850812: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-15 13:50:24.850871: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-15 13:50:24.850928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-15 13:50:24.850986: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-15 13:50:24.851038: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-15 13:50:24.851285: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:24.851554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:24.851675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2020-03-15 13:50:24.851773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-15 13:50:24.851813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-03-15 13:50:24.851846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-03-15 13:50:24.852060: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:24.852444: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2020-03-15 13:50:24.852607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 20053 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-03-15 13:50:24.965642: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: NoOp), (Op name: _SOURCE), (Reason: Unimplemented: Op type NoOp is not supported.)
2020-03-15 13:50:24.965835: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: NoOp), (Op name: _SINK), (Reason: Unimplemented: Op type NoOp is not supported.)
2020-03-15 13:50:24.966011: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:429] Not a TF-TRT candidate, (Op type: Placeholder), (Op name: Placeholder), (Reason: excluded by segmenter option)
2020-03-15 13:50:24.966147: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/Const
2020-03-15 13:50:24.966238: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/beta/read
2020-03-15 13:50:24.966367: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/moving_mean/read
2020-03-15 13:50:24.966448: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/moving_variance/read
2020-03-15 13:50:24.966503: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/Const
2020-03-15 13:50:24.966586: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/beta/read
2020-03-15 13:50:24.966711: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/moving_mean/read
2020-03-15 13:50:24.966761: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn2/moving_variance/read
2020-03-15 13:50:24.966844: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Conv/weights/read
2020-03-15 13:50:24.966928: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Conv/BatchNorm/Const
2020-03-15 13:50:24.967052: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Conv/BatchNorm/beta/read
2020-03-15 13:50:24.967102: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Conv/BatchNorm/moving_mean/read
2020-03-15 13:50:24.967147: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Conv/BatchNorm/moving_variance/read
2020-03-15 13:50:24.967222: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/FusedBatchNormV2-0-PermConstNCHWToNHWC-LayoutOptimizer
2020-03-15 13:50:24.967367: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Transpose), (Op name: bn1/FusedBatchNormV2-0-TransposeNCHWToNHWC-LayoutOptimizer
2020-03-15 13:50:24.967573: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: FusedBatchNormV2), (Op name: bn1/FusedBatchNormV2), (Reason: Unimplemented: FusedBatchNormV2 only supports data_format=NCHW, at bn1/FusedBatchNormV2)
2020-03-15 13:50:24.967654: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: bn1/FusedBatchNormV2-0-0-PermConstNHWCToNCHW-LayoutOptimizer
2020-03-15 13:50:24.967747: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Transpose), (Op name: bn1/FusedBatchNormV2-0-0-TransposeNHWCToNCHW-LayoutOptimizer
2020-03-15 13:50:24.967930: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: FusedBatchNormV2), (Op name: bn2/FusedBatchNormV2), (Reason: Unimplemented: FusedBatchNormV2 only supports data_format=NCHW, at bn2/FusedBatchNormV2)
2020-03-15 13:50:24.968016: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Conv/Conv2D-0-PermConstNCHWToNHWC-LayoutOptimizer
2020-03-15 13:50:24.968113: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Transpose), (Op name: Conv/Conv2D-0-TransposeNCHWToNHWC-LayoutOptimizer
2020-03-15 13:50:24.968297: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Conv2D), (Op name: Conv/Conv2D
2020-03-15 13:50:24.968449: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:439] Not a TF-TRT candidate, (Op type: FusedBatchNormV2), (Op name: Conv/BatchNorm/FusedBatchNormV2), (Reason: Unimplemented: FusedBatchNormV2 only supports data_format=NCHW, at Conv/BatchNorm/FusedBatchNormV2)
2020-03-15 13:50:24.968503: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Relu), (Op name: Conv/Relu
2020-03-15 13:50:24.968543: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Identity), (Op name: Identity
2020-03-15 13:50:24.968591: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Const), (Op name: Identity-0-0-PermConstNHWCToNCHW-LayoutOptimizer
2020-03-15 13:50:24.968650: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:447] Accepted as a TF-TRT candidate, (Op type: Transpose), (Op name: Identity-0-0-TransposeNHWCToNCHW-LayoutOptimizer
2020-03-15 13:50:24.968684: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:429] Not a TF-TRT candidate, (Op type: Identity), (Op name: Identity_1), (Reason: excluded by segmenter option)
2020-03-15 13:50:24.968718: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 7 ops of 4 different types in the graph that are not converted to TensorRT: Identity, FusedBatchNormV2, NoOp, Placeholder, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2020-03-15 13:50:24.968875: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/Const has no device assigned requested device is: 
2020-03-15 13:50:24.968906: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/beta/read has no device assigned requested device is: 
2020-03-15 13:50:24.968935: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/moving_mean/read has no device assigned requested device is: 
2020-03-15 13:50:24.969014: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn1/moving_variance/read has no device assigned requested device is: 
2020-03-15 13:50:24.969046: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/Const has no device assigned requested device is: 
2020-03-15 13:50:24.969075: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/beta/read has no device assigned requested device is: 
2020-03-15 13:50:24.969104: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/moving_mean/read has no device assigned requested device is: 
2020-03-15 13:50:24.969131: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node bn2/moving_variance/read has no device assigned requested device is: 
2020-03-15 13:50:24.969159: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Conv/weights/read has no device assigned requested device is: 
2020-03-15 13:50:24.969207: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Conv/BatchNorm/Const has no device assigned requested device is: 
2020-03-15 13:50:24.969243: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Conv/BatchNorm/beta/read has no device assigned requested device is: 
2020-03-15 13:50:24.969270: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Conv/BatchNorm/moving_mean/read has no device assigned requested device is: 
2020-03-15 13:50:24.969299: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Conv/BatchNorm/moving_variance/read has no device assigned requested device is: 
2020-03-15 13:50:24.969372: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Conv/Conv2D has no device assigned requested device is: 
2020-03-15 13:50:24.969407: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Conv/Relu has no device assigned requested device is: 
2020-03-15 13:50:24.969440: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:576] Node Identity has no device assigned requested device is: 
2020-03-15 13:50:24.969479: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.969515: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node Conv/BatchNorm/Const which is a Const.
2020-03-15 13:50:24.969555: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.969591: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.969621: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node Conv/BatchNorm/beta/read which is a Const.
2020-03-15 13:50:24.969653: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.969679: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.969719: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node Conv/BatchNorm/moving_mean/read which is a Const.
2020-03-15 13:50:24.969752: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.969779: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.969807: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node Conv/BatchNorm/moving_variance/read which is a Const.
2020-03-15 13:50:24.969845: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.969873: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 4
2020-03-15 13:50:24.969916: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 4
2020-03-15 13:50:24.969944: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 4
2020-03-15 13:50:24.969977: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 4
2020-03-15 13:50:24.970003: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970031: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn1/Const which is a Const.
2020-03-15 13:50:24.970061: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970088: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 2
2020-03-15 13:50:24.970123: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 2
2020-03-15 13:50:24.970151: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 2
2020-03-15 13:50:24.970186: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 2
2020-03-15 13:50:24.970213: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970256: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn1/beta/read which is a Const.
2020-03-15 13:50:24.970289: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970316: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970341: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn1/moving_mean/read which is a Const.
2020-03-15 13:50:24.970404: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970433: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970459: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn1/moving_variance/read which is a Const.
2020-03-15 13:50:24.970489: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970515: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970539: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn2/Const which is a Const.
2020-03-15 13:50:24.970569: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970596: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970623: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn2/beta/read which is a Const.
2020-03-15 13:50:24.970653: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970682: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970708: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn2/moving_mean/read which is a Const.
2020-03-15 13:50:24.970742: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970770: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:587] Segment original size: 1
2020-03-15 13:50:24.970797: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5790] --> Need to remove output node bn2/moving_variance/read which is a Const.
2020-03-15 13:50:24.970828: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:666] Segment new size: 0
2020-03-15 13:50:24.970860: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 0 has only 0 effective nodes, dropping
2020-03-15 13:50:24.970889: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 0 has only 0 effective nodes, dropping
2020-03-15 13:50:24.970918: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 0 has only 0 effective nodes, dropping
2020-03-15 13:50:24.970945: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 0 has only 0 effective nodes, dropping
2020-03-15 13:50:24.970985: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:680] Nodes in segment 0 with parent=Conv/Conv2D-0-PermConstNCHWToNHWC-LayoutOptimizer:
[Op type: Const] Conv/weights/read
[Op type: Const] Conv/Conv2D-0-PermConstNCHWToNHWC-LayoutOptimizer
[Op type: Transpose] Conv/Conv2D-0-TransposeNCHWToNHWC-LayoutOptimizer
[Op type: Conv2D] Conv/Conv2D
2020-03-15 13:50:24.971055: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:680] Nodes in segment 1 with parent=Identity-0-0-PermConstNHWCToNCHW-LayoutOptimizer:
[Op type: Relu] Conv/Relu
[Op type: Identity] Identity
[Op type: Const] Identity-0-0-PermConstNHWCToNCHW-LayoutOptimizer
[Op type: Transpose] Identity-0-0-TransposeNHWCToNCHW-LayoutOptimizer
2020-03-15 13:50:24.971103: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971138: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:680] Nodes in segment 2 with parent=bn1/FusedBatchNormV2-0-0-PermConstNHWCToNCHW-LayoutOptimizer:
[Op type: Const] bn1/FusedBatchNormV2-0-0-PermConstNHWCToNCHW-LayoutOptimizer
[Op type: Transpose] bn1/FusedBatchNormV2-0-0-TransposeNHWCToNCHW-LayoutOptimizer
2020-03-15 13:50:24.971171: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 2 effective nodes, dropping
2020-03-15 13:50:24.971203: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:680] Nodes in segment 2 with parent=bn1/FusedBatchNormV2-0-PermConstNCHWToNHWC-LayoutOptimizer:
[Op type: Const] bn1/FusedBatchNormV2-0-PermConstNCHWToNHWC-LayoutOptimizer
[Op type: Transpose] bn1/FusedBatchNormV2-0-TransposeNCHWToNHWC-LayoutOptimizer
2020-03-15 13:50:24.971236: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 2 effective nodes, dropping
2020-03-15 13:50:24.971268: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971335: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971364: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971393: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971422: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971478: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971507: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:693] Segment 2 has only 0 effective nodes, dropping
2020-03-15 13:50:24.971540: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:719] Devices Segment : 'Conv/Conv2D-0-PermConstNCHWToNHWC-LayoutOptimizer' /job:localhost/replica:0/task:0/device:GPU:0, 
2020-03-15 13:50:24.971578: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:719] Devices Segment : 'Identity-0-0-PermConstNHWCToNCHW-LayoutOptimizer' /job:localhost/replica:0/task:0/device:GPU:0, 
2020-03-15 13:50:24.971607: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:719] Devices Segment : 'bn1/FusedBatchNormV2-0-PermConstNCHWToNHWC-LayoutOptimizer' /job:localhost/replica:0/task:0/device:GPU:0, 
2020-03-15 13:50:24.971633: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:719] Devices Segment : 'bn1/FusedBatchNormV2-0-0-PermConstNHWCToNCHW-LayoutOptimizer' /job:localhost/replica:0/task:0/device:GPU:0, 
2020-03-15 13:50:24.971688: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 2
2020-03-15 13:50:24.971765: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node Conv/weights/read neither have requested device nor assigned device
2020-03-15 13:50:24.971812: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:208] Input edge = bn2/FusedBatchNormV2:0
2020-03-15 13:50:24.971847: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node Conv/Conv2D neither have requested device nor assigned device
2020-03-15 13:50:24.971879: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:240] Output edge = Conv/Conv2D:0
2020-03-15 13:50:24.971984: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5693] Constructing input TensorRTInputPH_0 for the edge bn2/FusedBatchNormV2:0 -> Conv/Conv2D-0-TransposeNCHWToNHWC-LayoutOptimizer:0
2020-03-15 13:50:24.972088: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5715] Constructing output TensorRTOutputPH_0 for the edge Conv/Conv2D:0 -> Conv/BatchNorm/FusedBatchNormV2:0
2020-03-15 13:50:24.972165: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Conv/weights/read to subgraph
2020-03-15 13:50:24.972207: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Conv/Conv2D-0-PermConstNCHWToNHWC-LayoutOptimizer to subgraph
2020-03-15 13:50:24.972282: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Conv/Conv2D-0-TransposeNCHWToNHWC-LayoutOptimizer to subgraph
2020-03-15 13:50:24.972352: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Conv/Conv2D to subgraph
2020-03-15 13:50:24.972389: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5740] Updating Conv/Conv2D-0-TransposeNCHWToNHWC-LayoutOptimizer:0 from bn2/FusedBatchNormV2 to TensorRTInputPH_0
2020-03-15 13:50:24.972434: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:262] Converted TensorRT candidate segment 'Conv/TRTEngineOp_0' to a GraphDef
2020-03-15 13:50:24.973319: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:543] Adding funcdef Conv/TRTEngineOp_0_native_segment to graphlib
2020-03-15 13:50:24.994583: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node Conv/Relu neither have requested device nor assigned device
2020-03-15 13:50:24.994701: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:208] Input edge = Conv/BatchNorm/FusedBatchNormV2:0
2020-03-15 13:50:24.994792: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:164] Node Identity neither have requested device nor assigned device
2020-03-15 13:50:24.994830: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:240] Output edge = Identity-0-0-TransposeNHWCToNCHW-LayoutOptimizer:0
2020-03-15 13:50:24.995027: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5693] Constructing input TensorRTInputPH_0 for the edge Conv/BatchNorm/FusedBatchNormV2:0 -> Conv/Relu:0
2020-03-15 13:50:24.995107: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5715] Constructing output TensorRTOutputPH_0 for the edge Identity-0-0-TransposeNHWCToNCHW-LayoutOptimizer:0 -> Identity_1:0
2020-03-15 13:50:24.996305: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Identity-0-0-PermConstNHWCToNCHW-LayoutOptimizer to subgraph
2020-03-15 13:50:24.996356: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Conv/Relu to subgraph
2020-03-15 13:50:24.996407: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Identity to subgraph
2020-03-15 13:50:24.996444: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5730] Copying Identity-0-0-TransposeNHWCToNCHW-LayoutOptimizer to subgraph
2020-03-15 13:50:24.996475: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:5740] Updating Conv/Relu:0 from Conv/BatchNorm/FusedBatchNormV2 to TensorRTInputPH_0
2020-03-15 13:50:24.996559: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:262] Converted TensorRT candidate segment 'TRTEngineOp_1' to a GraphDef
2020-03-15 13:50:24.997255: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:543] Adding funcdef TRTEngineOp_1_native_segment to graphlib
2020-03-15 13:50:24.997486: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:698] Current cuda device is 0
2020-03-15 13:50:24.997566: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:710] Assigned 1250000000 bytes to Conv/TRTEngineOp_0
2020-03-15 13:50:24.997611: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:590] Using allocator GPU_0_bfc and cuda_device_id 0
2020-03-15 13:50:24.997690: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:334] Processing Conv/TRTEngineOp_0
2020-03-15 13:50:24.997761: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:398] Engine Input bn2/FusedBatchNormV2:0 -> Conv/TRTEngineOp_0:0
2020-03-15 13:50:24.997810: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:442] Conv/TRTEngineOp_0 inputs= bn2/FusedBatchNormV2:0 
2020-03-15 13:50:24.998037: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:469] Adding TRTEngine Conv/TRTEngineOp_0 to graph
2020-03-15 13:50:24.998105: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:487] input_nodes size = 1
2020-03-15 13:50:24.998145: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:491] Connecting data edge from bn2/FusedBatchNormV2:0 to Conv/TRTEngineOp_0:0
2020-03-15 13:50:24.998178: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:514] Updating data edge from Conv/TRTEngineOp_0:0 to Conv/BatchNorm/FusedBatchNormV2:0
2020-03-15 13:50:24.998245: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node Conv/TRTEngineOp_0 added for segment 0 consisting of 4 nodes succeeded.
2020-03-15 13:50:24.998284: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:744] Segment consists of nodes: Conv/weights/read, Conv/Conv2D-0-PermConstNCHWToNHWC-LayoutOptimizer, Conv/Conv2D-0-TransposeNCHWToNHWC-LayoutOptimizer, Conv/Conv2D, 
2020-03-15 13:50:24.998370: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:710] Assigned 1250000000 bytes to TRTEngineOp_1
2020-03-15 13:50:24.998408: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:590] Using allocator GPU_0_bfc and cuda_device_id 0
2020-03-15 13:50:24.998442: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:334] Processing TRTEngineOp_1
2020-03-15 13:50:24.998473: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:398] Engine Input Conv/BatchNorm/FusedBatchNormV2:0 -> TRTEngineOp_1:0
2020-03-15 13:50:24.998548: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:442] TRTEngineOp_1 inputs= Conv/BatchNorm/FusedBatchNormV2:0 
2020-03-15 13:50:24.998689: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:469] Adding TRTEngine TRTEngineOp_1 to graph
2020-03-15 13:50:24.998784: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:487] input_nodes size = 1
2020-03-15 13:50:24.998816: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:491] Connecting data edge from Conv/BatchNorm/FusedBatchNormV2:0 to TRTEngineOp_1:0
2020-03-15 13:50:24.998845: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:514] Updating data edge from TRTEngineOp_1:0 to Identity_1:0
2020-03-15 13:50:24.998896: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 4 nodes succeeded.
2020-03-15 13:50:24.998928: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:744] Segment consists of nodes: Conv/Relu, Identity, Identity-0-0-PermConstNHWCToNCHW-LayoutOptimizer, Identity-0-0-TransposeNHWCToNCHW-LayoutOptimizer, 
2020-03-15 13:50:24.999276: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:757] Returning from conversion
2020-03-15 13:50:25.011224: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-03-15 13:50:25.022684: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-03-15 13:50:25.032467: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:839] Optimization results for grappler item: tf_graph
2020-03-15 13:50:25.032576: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 21 nodes (-10), 20 edges (-10), time = 7.45ms.
2020-03-15 13:50:25.032685: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   layout: Graph size after: 29 nodes (8), 28 edges (8), time = 5.556ms.
2020-03-15 13:50:25.032764: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 29 nodes (0), 28 edges (0), time = 3.094ms.
2020-03-15 13:50:25.032800: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   TensorRTOptimizer: Graph size after: 23 nodes (-6), 22 edges (-6), time = 36.117ms.
2020-03-15 13:50:25.032831: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 23 nodes (0), 22 edges (0), time = 2.915ms.
2020-03-15 13:50:25.032901: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:839] Optimization results for grappler item: Conv/TRTEngineOp_0_native_segment
2020-03-15 13:50:25.032929: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.225ms.
2020-03-15 13:50:25.032953: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   layout: Graph size after: 6 nodes (0), 5 edges (0), time = 1.22ms.
2020-03-15 13:50:25.032980: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.764ms.
2020-03-15 13:50:25.033007: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   TensorRTOptimizer: Graph size after: 6 nodes (0), 5 edges (0), time = 0.215ms.
2020-03-15 13:50:25.033034: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.337ms.
2020-03-15 13:50:25.033061: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:839] Optimization results for grappler item: TRTEngineOp_1_native_segment
2020-03-15 13:50:25.033087: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 0.933ms.
2020-03-15 13:50:25.033114: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   layout: Graph size after: 6 nodes (0), 5 edges (0), time = 1.03ms.
2020-03-15 13:50:25.033141: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.475ms.
2020-03-15 13:50:25.033173: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   TensorRTOptimizer: Graph size after: 6 nodes (0), 5 edges (0), time = 0.381ms.
2020-03-15 13:50:25.033207: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:841]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.054ms.

I can also provide the frozen graph pb files if those are useful for debugging this issue.

Thanks for the detailed triage @danfischetti ! @bixia1 can you PTAL?