How should I speed up T5 exported saved_model by using TF-TRT ?

Question

How should I speed up T5 exported saved_model by using TF-TRT ?

chenl-chief opened this issue 2 years ago · 0 comments

THE ISSUES SECTION IS ONLY FOR FILING BUGS. PLEASE ASK YOUR QUESTION ON THE DISCUSSION TAB.
My env:

Docker image: nvcr.io/nvidia/tensorflow:22.05-tf2-py3, TRT: 8.2.5.1, CUDA: 11.7
tf 2.8

The original saved_model tooks 300ms when batch_size=32 and sen_length=128, it's too long for deploy. So I wanted to speed up t5 by using tf-trt. But when I convert saved_model using below code, tf-trt doesn't work:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf

tf.compat.v1.disable_v2_behavior()

input_saved_model_dir = 'exported_model/batch32_length128_0810/1660123651'
output_saved_model_dir = 'trt_saved_model/batch32_length128_0810/1/'
converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_saved_model_dir,
    max_workspace_size_bytes=(11<32),
    max_batch_size=32,
    minimum_segment_size=50,
    precision_mode='FP32',
    is_dynamic_op=True,
    maximum_cached_engines=1)

converter.convert()
converter.save(output_saved_model_dir)

Before using the code, you should add some code in tensorflow/python/compiler/tensorrt/trt_convert.py. The reference is here
Could some body help me about this?