Converted fp16 or int8 model require up to 10 minutes to startup.

Question

Converted fp16 or int8 model require up to 10 minutes to startup.

devalexqt opened this issue 3 years ago · 6 comments

Why it's take so long and take almost 30GB of GPU memory? Is it rebuilding model every time then I run it? Can you fix it?

Answer 1 · 2021-06-21T14:27:07.000Z

To avoid long startup time please call converter.build() before you save the model, see the example here. Otherwise only a placeholder for the TRT engine is saved in the graph, and the TRT engine is rebuilt every time you load the model.

Answer 2 · 2021-06-21T15:27:55.000Z

I already have converter.build() in my code but this NOT helping to prevent rebuild every time at startup in INT8 or FP16 mode.

BATCH_SIZE=1
data_directory = "/dataset/hr_train"
calibration_files = [os.path.join(path, name) for path, _, files in os.walk(data_directory) for name in files]
print('There are %d calibration files. \n%s\n%s\n...'%(len(calibration_files), calibration_files[0], calibration_files[-1]))

def parse_file(filepath):
    image = tf.io.read_file(filepath)
    image = tf.image.decode_image(image, channels=3)
    image = tf.image.random_crop(image, size=(360, 640,3))
    image=tf.cast(image,tf.float32)/255
    # image=tf.expand_dims(image,axis=0)
    return image

num_calibration_batches = 10

dataset = tf.data.Dataset.from_tensor_slices(calibration_files)
dataset = dataset.map(map_func=parse_file, num_parallel_calls=20)
dataset = dataset.batch(batch_size=BATCH_SIZE)
dataset = dataset.repeat(None)
calibration_dataset = dataset.take(num_calibration_batches)

def my_calibration_input_fn():
    for x in calibration_dataset:
        yield (x, )

params = tf.experimental.tensorrt.ConversionParams(
    precision_mode='INT8',
    maximum_cached_engines=1,
    use_calibration=True,
    # max_workspace_size_bytes=40000000,
)
converter = tf.experimental.tensorrt.Converter(
    input_saved_model_dir=INPUT_SAVED_MODEL_DIR,
     conversion_params=params,
     )

converter.convert(calibration_input_fn=my_calibration_input_fn)

def my_input_fn():
    inp1 = tf.random.normal([1,360,640,3])
    yield [inp1]


converter.build(input_fn=my_input_fn)  # Generate corresponding TRT engines
converter.save(OUTPUT_SAVED_MODEL_DIR)

Answer 3 · 2021-06-21T15:28:56.000Z

Looks like it's a BUG in TRT engine.

Answer 4 · 2021-06-25T12:23:42.000Z

Thanks @devalexqt for the update.

TF-TRT would create a new engine every time it sees input shape which it cannot handle with the existing engine. For example, if you create an engine with batch_size (N=1), and infer it N=8, then a new engine will be created (large overhed), and stored in the engine cache. Further inference requests with N <= 8 should run using that engine without large overhead.

If this does not apply to you, that means we have a bug. We would need some information on your network (preferably a reproducer script) to investigate that.

How does the memory size compare to the original model size? How many engines are created? If you have a large number of engines, that might explain the memory consumption. If you increase the minimum_segment_size parameter, that would reduce the number of engines and the memory consumption.

Here is how to print the number of engines:

from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
import re

def get_func_from_saved_model(saved_model_dir):
    saved_model_loaded = tf.saved_model.load(
        saved_model_dir, tags=[tag_constants.SERVING])
    graph_func = saved_model_loaded.signatures[
        signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
    return graph_func, saved_model_loaded

loaded_func, _ = get_func_from_saved_model("/tmp/models/trt_model")

print('Engine name       num of nodes')
n_engines = 0
pattern = re.compile(r'(TRTEngineOp_\d+_\d+)')
for func in loaded_func.graph.as_graph_def().library.function:
    m = pattern.search(func.signature.name)
    if m:
        n_engines += 1
        print("{:20s} {:5d}".format(m.group(1), len(func.node_def)))
print('\nTotal number of TensorRT engines', n_engines)

Answer 5 · 2021-06-30T00:56:10.000Z

let me check

Answer 6 · 2021-06-30T12:38:19.000Z

For testing I use nvcr.io/nvidia/tensorflow:21.05-tf2-py3 docker image and batch=1 and create only one engine for input: inp1 = tf.random.normal([1,360,640,3]) for testing.

Output of your script:

Engine name       num of nodes
TRTEngineOp_0_0        475

Total number of TensorRT engines 1

Original model size on disk: 2.5MB but converted model size is 13MB.