No significant performance improvement on UNET semantic segmentation model.

Question

No significant performance improvement on UNET semantic segmentation model.

dhinkris opened this issue 5 years ago · 6 comments

Hi,
I used TFTRT to optimize the 3D UNET segmentation. I couldn't find any significant improvement in the speed.
On a P100 GPU, for a batch size of 4 averaged for 20 runs:
i) keras model = 0.78s
ii) TFTRT-FP32 = 0.74s
iii) TFTRT-FP16 = 0.74s

Can somebody help if this can be optimized?
Thank you

Answer 1 · 2019-12-12T22:47:41.000Z

Hi,
I used TFTRT to optimize the 3D UNET segmentation. I couldn't find any significant improvement in the speed.
On a P100 GPU, for a batch size of 4 averaged for 20 runs:
i) keras model = 0.78s
ii) TFTRT-FP32 = 0.74s
iii) TFTRT-FP16 = 0.74s

Can somebody help if this can be optimized?
Thank you

Hi @dhinkris ,

Can I ask how you converted the TensorFlow model to TensorRT engine?

I encountered problems with Conv3D operator so I have to build the network in Pytorch and then export to ONNX format.

Then I have another problem using TensorRT carried onnx parser, which complained paddings having size == 8.

Thanks.
Zheng

Answer 2 · 2019-12-14T00:30:44.000Z

@dhinkris Could you run TF-TRT with verbose logging and attach the log here.

What version of TF and TRT did you use?

Answer 3 · 2019-12-24T16:00:18.000Z

@pooyadavoodi below are the logs
for FP32
2019-12-24 10:55:50.344930: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2019-12-24 10:55:50.344979: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (-126), 326 edges (-140), time = 2886.54199ms.
2019-12-24 10:55:50.344985: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: Graph size after: 298 nodes (0), 326 edges (0), time = 698.473ms.
2019-12-24 10:55:50.344990: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (0), 326 edges (0), time = 1178.32605ms.

for FP16
2019-12-24 10:56:25.241831: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2019-12-24 10:56:25.241877: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (-126), 326 edges (-140), time = 2877.59204ms.
2019-12-24 10:56:25.241883: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: Graph size after: 298 nodes (0), 326 edges (0), time = 724.761ms.
2019-12-24 10:56:25.241889: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (0), 326 edges (0), time = 1268.29602ms.

I am using Tensorflow 1.14 and not sure what version of TRT it uses.

Thank you.

Answer 4 · 2019-12-24T16:04:37.000Z

@Zhen-Xing I used keras models and converted it to tensorflow model. You can take a look at the below function

class FrozenGraph(object):
    def __init__(self, model, shape):
        shape = (None, shape[0], shape[1], shape[2], shape[3])
        x_name = 'image_tensor_x'
        with K.get_session() as sess:
            x_tensor = tf.placeholder(tf.float32, shape, x_name)
            K.set_learning_phase(0)
            y_tensor = model(x_tensor)
            y_name = y_tensor.name[:-2]
            graph = sess.graph.as_graph_def()
            graph0 = tf.graph_util.convert_variables_to_constants(sess, graph, [y_name])
            graph1 = tf.graph_util.remove_training_nodes(graph0)
        self.x_name = [x_name]
        self.y_name = [y_name]
        self.frozen = graph1

model = load_model(modelname)
frozen_graph = FrozenGraph(model, (shape)) 
tf_engine = TfEngine(frozen_graph)

Answer 5 · 2020-01-04T00:09:28.000Z

2019-12-24 10:55:50.344930: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2019-12-24 10:55:50.344979: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (-126), 326 edges (-140), time = 2886.54199ms.
2019-12-24 10:55:50.344985: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] layout: Graph size after: 298 nodes (0), 326 edges (0), time = 698.473ms.
2019-12-24 10:55:50.344990: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718] constant folding: Graph size after: 298 nodes (0), 326 edges (0), time = 1178.32605ms.

Could you attach the full log. This one doesn't have the information I am looking for.
Here is how to get verbose logging: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#verbose

Answer 6 · 2020-01-29T15:26:21.000Z

Hi @pooyadavoodi ,
I am getting the same log if I use these flags:
TF_CPP_VMODULE=segment=2,convert_graph=2,convert_nodes=2,trt_engine=1,trt_logger=2

But if I use this flag, I got lot for 4gb.
TF_CPP_MIN_VLOG_LEVEL=2 python

I have uploaded that file here.
https://documentcloud.adobe.com/link/track?uri=urn%3Aaaid%3Ascds%3AUS%3Ad03518c5-cc26-4685-9c4c-6b32713b4b48

Please let me know if this is helpful.
Thank you.