How should I speed up T5 exported saved_model by using TF-TRT ?
chenl-chief opened this issue · 0 comments
chenl-chief commented
THE ISSUES SECTION IS ONLY FOR FILING BUGS. PLEASE ASK YOUR QUESTION ON THE DISCUSSION TAB.
My env:
Docker image: nvcr.io/nvidia/tensorflow:22.05-tf2-py3, TRT: 8.2.5.1, CUDA: 11.7
tf 2.8
The original saved_model tooks 300ms when batch_size=32 and sen_length=128, it's too long for deploy. So I wanted to speed up t5 by using tf-trt. But when I convert saved_model using below code, tf-trt doesn't work:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
input_saved_model_dir = 'exported_model/batch32_length128_0810/1660123651'
output_saved_model_dir = 'trt_saved_model/batch32_length128_0810/1/'
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
max_batch_size=32,
minimum_segment_size=50,
precision_mode='FP32',
is_dynamic_op=True,
maximum_cached_engines=1)
converter.convert()
converter.save(output_saved_model_dir)
Before using the code, you should add some code in tensorflow/python/compiler/tensorrt/trt_convert.py. The reference is here
Could some body help me about this?