NVIDIA/TensorRT

SavedModel to TensorRT converter fails if the model uses lookup tables

Closed this issue · 6 comments

Description

The TF-TRT converter fails to save the model if it uses a lookup table. This colab illustrates how I create a simple model that uses a Keras IndexLookup layer that in turn makes the savedmodel contain a DT_RESOURCE tensor with the lookup table contents. The last cell of the notebook that does the conversion to TensorRT fails because TensorRT is not currently supported on Colab. I use the NVIDIA Tensorflow container to actually convert the model from savedmodel to tensorrt, see "steps to reproduce" below.

Does TF-TRT support lookup tables? If not, are there known workarounds?

I tried to search for related bug reports or changes, and found that older versions of TF failed with a different error (see e.g. tensorflow/tensorflow#42673). This commit seems to have changed the error to the one I'm reporting.

I also attempted to convert to ONNX but hit a similar problem, found an existing open bug report and attached a colab reproducing the error there, see onnx/tensorflow-onnx#1867.

Environment

TensorRT Version: 8.2.5-1+cuda11.4
NVIDIA GPU: A10G
NVIDIA Driver Version: 470.57.02 (in CUDA Forward Compatibility mode "Using CUDA 11.7 driver version 515.48.08 with kernel driver version 470.57.02")
CUDA Version: 11.7
Operating System: Ubuntu 20.04
Python Version (if applicable): 3.8.10
Tensorflow Version (if applicable): 2.9.1
Baremetal or Container (if so, version): nvcr.io/nvidia/tensorflow:22.06-tf2-py3

Steps To Reproduce

  1. Train a model using the code from the colab notebook, store the model to /models/model.

  2. Start the NVIDIA Tensorflow container with:

    docker run --gpus all -it -v/models:/models --rm nvcr.io/nvidia/tensorflow:22.06-tf2-py3

  3. Start the Python REPL and run the following script:

    from tensorflow.python.compiler.tensorrt import trt_convert as trt
    conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
    converter = trt.TrtGraphConverterV2(input_saved_model_dir="/models/model", conversion_params=conversion_params)
    converter.convert()
    converter.save("/models/output")

The error traceback:

Traceback (most recent call last):
  File "/models/convert_to_tensorrt.py", line 7, in <module>
    converter.save("/models/output")
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1510, in save
    save.save(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1290, in save
    save_and_return_nodes(obj, export_dir, signatures, options)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1325, in save_and_return_nodes
    _build_meta_graph(obj, signatures, options, meta_graph_def))
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1491, in _build_meta_graph
    return _build_meta_graph_impl(obj, signatures, options, meta_graph_def)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/save.py", line 1437, in _build_meta_graph_impl
    signature_serialization.canonicalize_signatures(signatures))
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/signature_serialization.py", line 180, in canonicalize_signatures
    final_concrete = signature_wrapper._get_concrete_function_garbage_collected(  # pylint: disable=protected-access
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 1219, in _get_concrete_function_garbage_collected
    self._initialize(args, kwargs, add_initializers_to=initializers)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 785, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2480, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2711, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 2627, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py", line 1141, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py", line 1127, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py", line 1116, in autograph_handler
    return autograph.converted_call(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_filelbyy4owv.py", line 12, in tf__signature_wrapper
    structured_outputs = ag__.converted_call(ag__.ld(signature_function), (), dict(**ag__.ld(kwargs)), fscope)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 377, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 458, in _call_unconverted
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1602, in __call__
    return self._call_impl(args, kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/wrap_function.py", line 243, in _call_impl
    return super(WrappedFunction, self)._call_impl(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1620, in _call_impl
    return self._call_with_flat_signature(args, kwargs, cancellation_manager)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py", line 1652, in _call_with_flat_signature
    raise TypeError(f"{self._flat_signature_summary()} missing required "
TypeError: in user code:


    TypeError: pruned(age, cabin, unknown) missing required arguments: unknown.

I'm not an expert on TF-TRT, but TRT doesn't support IndexLookup layer, onnx doesn't support this layer too?

Also from your log, I don't think TensorRT gets involved here. so it would be better to ask in the Tensorflow repo.

@zerollzeng

I'm not an expert on TF-TRT, but TRT doesn't support IndexLookup layer, onnx doesn't support this layer too?

When I hit the problem I suspected that neither of the converters support IndexLookup (or rather TF hash tables) but I didn't find any evidence of that. Should this be documented somewhere or explained in the error messages?

Also from your log, I don't think TensorRT gets involved here. so it would be better to ask in the Tensorflow repo.

Good point, somehow I thought the converter (that produces the model that can't be saved) is the scope of TensorRT itself, but you're right it's all in the TF repo so I should've filed this bug there.

So there are several issues in both the TF and TensorRT repos that mention this problem (without anybody clearly confirming that TensorRT or TF-TRT do not support lookup tables indeed). This is the list of issues that cross reference each other (I'm linking to particular comments that say "it seems like" lookup tables are not supported):
tensorflow/tensorflow#46254 (comment)
tensorflow/tensorrt#233 (comment)
tensorflow/text#486 (comment)
tensorflow/tensorrt#233 (comment)

The last comment from @bixia1 said that was fixed, and she linked the fix I mentioned in the description, which didn't actually fix the problem.

I don't think I should file yet another issue like this in the TF repo.

Able to reproduce this in TF2.9 (using nvidia/tensorflow:22.08-tf2-py3)

I will closing this since no activity for a long time, also FYI, there are more TF-TRT experts for this issue in https://github.com/tensorflow/tensorrt/issues, thanks!