Kaggle/docker-python

TF TensorRT misconfigured

maciejskorski opened this issue ยท 0 comments

๐Ÿ› Bug

Tensorflow TensorRT seems to be wrongly linked.

To Reproduce

On a few recent images (including gcr.io/kaggle-gpu-images/python latest 311277776c9b 7 days ago 47.2GB) I see very different linked and loaded TensorRT libs, namely 8.4 vs 8.6.

import tensorflow.compiler as tf_cc
linked_trt_ver=tf_cc.tf2tensorrt._pywrap_py_utils.get_linked_tensorrt_version()
print(f"Linked TRT ver: {linked_trt_ver}")
loaded_trt_ver=tf_cc.tf2tensorrt._pywrap_py_utils.get_loaded_tensorrt_version()
print(f"Loaded TRT ver: {loaded_trt_ver}")
# Linked TRT ver: (8, 4, 3)
# Loaded TRT ver: (8, 6, 1)

This has been Python. Now, the system inference libraries are indeed at 8.6:

dpkg -l | grep TensorRT

Now, minimal compatibility rules are fulfilled - the loaded version more recent.

However, the linking doesn't work properly. Under these recent containers, minimal TensorRT samples crash:

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants

from tensorflow.keras.applications.resnet50 import ResNet50
tf_model_dir = './models/tf_model'
model = ResNet50(include_top=50, weights='imagenet')
model.save(tf_model_dir)

converter = trt.TrtGraphConverterV2(  
   input_saved_model_dir=tf_model_dir,
)
converter.convert()

MAX_BATCH_SIZE=1 
def input_fn():
   img = tf.random.normal((MAX_BATCH_SIZE, 224,224,3),dtype=tf.float32)
   return (img, )

import faulthandler
faulthandler.enable()
converter.build(input_fn=input_fn) #SEGMENTATION FAULT can happen under missconfigured software!

Expected behavior

Align versions and make the sample code runnable.

Additional context

See the NVIDIA installation guidelines

Conditions from `tensorrt

        "Loaded TensorRT %s but linked TensorFlow against TensorRT %s. A few "
        "requirements must be met:\n"
        "\t-It is required to use the same major version of TensorRT during "
        "compilation and runtime.\n"
        "\t-TensorRT does not support forward compatibility. The loaded "
        "version has to be equal or more recent than the linked version.",