Converted fp16 or int8 model require up to 10 minutes to startup.
devalexqt opened this issue · 6 comments
Why it's take so long and take almost 30GB of GPU memory? Is it rebuilding model every time then I run it? Can you fix it?
To avoid long startup time please call converter.build()
before you save the model, see the example here. Otherwise only a placeholder for the TRT engine is saved in the graph, and the TRT engine is rebuilt every time you load the model.
I already have converter.build()
in my code but this NOT helping to prevent rebuild every time at startup in INT8
or FP16
mode.
BATCH_SIZE=1
data_directory = "/dataset/hr_train"
calibration_files = [os.path.join(path, name) for path, _, files in os.walk(data_directory) for name in files]
print('There are %d calibration files. \n%s\n%s\n...'%(len(calibration_files), calibration_files[0], calibration_files[-1]))
def parse_file(filepath):
image = tf.io.read_file(filepath)
image = tf.image.decode_image(image, channels=3)
image = tf.image.random_crop(image, size=(360, 640,3))
image=tf.cast(image,tf.float32)/255
# image=tf.expand_dims(image,axis=0)
return image
num_calibration_batches = 10
dataset = tf.data.Dataset.from_tensor_slices(calibration_files)
dataset = dataset.map(map_func=parse_file, num_parallel_calls=20)
dataset = dataset.batch(batch_size=BATCH_SIZE)
dataset = dataset.repeat(None)
calibration_dataset = dataset.take(num_calibration_batches)
def my_calibration_input_fn():
for x in calibration_dataset:
yield (x, )
params = tf.experimental.tensorrt.ConversionParams(
precision_mode='INT8',
maximum_cached_engines=1,
use_calibration=True,
# max_workspace_size_bytes=40000000,
)
converter = tf.experimental.tensorrt.Converter(
input_saved_model_dir=INPUT_SAVED_MODEL_DIR,
conversion_params=params,
)
converter.convert(calibration_input_fn=my_calibration_input_fn)
def my_input_fn():
inp1 = tf.random.normal([1,360,640,3])
yield [inp1]
converter.build(input_fn=my_input_fn) # Generate corresponding TRT engines
converter.save(OUTPUT_SAVED_MODEL_DIR)
Looks like it's a BUG in TRT engine.
Thanks @devalexqt for the update.
TF-TRT would create a new engine every time it sees input shape which it cannot handle with the existing engine. For example, if you create an engine with batch_size (N=1), and infer it N=8, then a new engine will be created (large overhed), and stored in the engine cache. Further inference requests with N <= 8 should run using that engine without large overhead.
If this does not apply to you, that means we have a bug. We would need some information on your network (preferably a reproducer script) to investigate that.
How does the memory size compare to the original model size? How many engines are created? If you have a large number of engines, that might explain the memory consumption. If you increase the minimum_segment_size
parameter, that would reduce the number of engines and the memory consumption.
Here is how to print the number of engines:
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
import re
def get_func_from_saved_model(saved_model_dir):
saved_model_loaded = tf.saved_model.load(
saved_model_dir, tags=[tag_constants.SERVING])
graph_func = saved_model_loaded.signatures[
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
return graph_func, saved_model_loaded
loaded_func, _ = get_func_from_saved_model("/tmp/models/trt_model")
print('Engine name num of nodes')
n_engines = 0
pattern = re.compile(r'(TRTEngineOp_\d+_\d+)')
for func in loaded_func.graph.as_graph_def().library.function:
m = pattern.search(func.signature.name)
if m:
n_engines += 1
print("{:20s} {:5d}".format(m.group(1), len(func.node_def)))
print('\nTotal number of TensorRT engines', n_engines)
let me check
For testing I use nvcr.io/nvidia/tensorflow:21.05-tf2-py3
docker image and batch=1
and create only one engine for input: inp1 = tf.random.normal([1,360,640,3])
for testing.
Output of your script:
Engine name num of nodes
TRTEngineOp_0_0 475
Total number of TensorRT engines 1
Original model size on disk: 2.5MB but converted model size is 13MB.