No improvement using TensorRT5

Question

No improvement using TensorRT5

IwakuraRein opened this issue 3 years ago · 0 comments

system: Ubuntu 16.04
python: 3.6.13
tensorflow: 1.15.0
TensorRT: 5.0.2.6
GPU: RTX2080TI

At first, I installed TensorRT7 but it gave me an error when looking for 'libnvinfer.so.5'. There is no 'libnvinfer.so.5' but a 'libnvinfer.so.7' instead. Then I installed TensorRT5 and followed the instructions here. This time I successfully created the optimized graph.

My codes:

    config = tf.ConfigProto(allow_soft_placement=True, graph_options=tf.GraphOptions(
        optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
    config.gpu_options.allow_growth = True
    
    # load saved model
    with tf.gfile.GFile(SAVE_PATH+'_classroom/model/best_model.pb', 'rb') as f:
        frozen_graph = tf.GraphDef()
        frozen_graph.ParseFromString(f.read())

    # create optimized graph
    trt_graph = trt.TrtGraphConverter(input_graph_def=frozen_graph, session_config=config,nodes_blacklist=return_elements_list,is_dynamic_op=True,precision_mode=precision,minimum_segment_size=segment).convert()

    sess = tf.Session(config=config)
    tf.import_graph_def(trt_graph,{'source':model.source},return_elements=return_elements_list)
    run_metadata = tf.RunMetadata()
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    sess.run(tf.global_variables_initializer())
    while True:
        try:
            src_hdr_in, tgt_hdr_in = sess.run(next_element_large,
                feed_dict={handle_large: test_handle})
            src_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, INPUT_CHANNEL))
            tgt_hdr = np.zeros((test_batch_size, PADDING_HEIGHT, IMAGE_WIDTH, TARGET_CHANNEL))            
            src_hdr[:,0:IMAGE_HEIGHT,:,:] = src_hdr_in
            tgt_hdr[:,0:IMAGE_HEIGHT,:,:] = tgt_hdr_in
            feed_dict = {model.source: src_hdr}
            output_tensor = sess.graph.get_tensor_by_name(output_tensor_name)
            denoised_1_bd = sess.run(output_tensor, feed_dict, options=run_options, run_metadata=run_metadata)
    # ...

I use Tensorflow's profiler to generate a timeline.json. There are multiple names I've never seen before in the timeline table, such as 'volta_scudnn_128x32_relu_small_nn_v1', so I think the profiler is describing the optimized graph correctly, not the vanilla one.

However, no improvement appears according to the timeline.json. The inference times are nearly the same. My network is purely CNN with a structure similar to U-Net. I supposed there would be an improvement nearly 2x.