Running multiple TensorRT-optimized models in Tensorflow

Question

Running multiple TensorRT-optimized models in Tensorflow

Opened this issue 4 years ago · 3 comments

I am working with the Tensorflow 2.0 project that uses multiple models for inference.
Some of those models were optimized using TF-TRT.

I tried both regular offline conversion and offline conversion with engine serialization. In case of regular conversion TensorRT engine is rebuilt every time model execution context changes. While using models with serialized engines, I’m not able to load more than one TensorRT-optimized models.

My application uses single Session at runtime.

I am using nvcr.io/nvidia/tensorflow:19.12-tf2-py3 docker container to optimize models and run the application.

More about the issue in:
https://stackoverflow.com/questions/60967867/running-multiple-tensorrt-optimized-models-in-tensorflow

What is the correct approach to run simultaneously multiple TensorRT-optimized models with pre-built engines using Tensorflow?

Is it a valid solution to use a separate Session for each of those models

Answer 1 · 2020-04-08T16:24:40.000Z

Thanks for the detailed report. It is a valid use case to have multiple models with multiple pre-built engines. We seem to have a problem with the way the engines cached, we are working on this problem. This is related to Issue #195, we will continue the discussion there.

Answer 2 · 2020-06-23T19:30:45.000Z

@tfeher
I am also having a problem running two tensorRT optimized models. The inference is completed for the first network, but then in the second network, the errors I included below occur. Is this a similar issue or something completely different? I am using tf 2.1.0 and both models run properly when they are separated, however when I load both models in the same program and run inference sequentially the second model always fails with the cache size error.

2020-06-23 12:22:53.617659: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at trt_engine_op.cc:494 : Invalid argument: Input shape list size mismatch for PartitionedCall/TRTEngineOp_5, cached size: 6 vs. actual size: 1
2020-06-23 12:22:53.654311: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: Input shape list size mismatch for PartitionedCall/TRTEngineOp_5, cached size: 6 vs. actual size: 1
[[{{node PartitionedCall/TRTEngineOp_5}}]]
Traceback (most recent call last):
File "live_inf.py", line 108, in
image_s,results=compute_inference_seg(infer_seg,image_np)
File "live_inf.py", line 69, in compute_inference_seg
results=infer(input_tensor)['output'][0]
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1551, in call
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load.py", line 99, in _call_flat
cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape list size mismatch for PartitionedCall/TRTEngineOp_5, cached size: 6 vs. actual size: 1
[[{{node PartitionedCall/TRTEngineOp_5}}]] [Op:__inference_signature_wrapper_29995]

Function call stack:
signature_wrapper

Answer 3 · 2022-11-07T14:39:12.000Z

@anoushsepehri
I am facing the same issue using multiple networks converted by tensorrt.
Have you found any workaround?