Serve tf-trt converted model return error: NodeDef mentions attr 'max_batch_size' not in Op: name=TRTEngineOp

Question

Serve tf-trt converted model return error: NodeDef mentions attr 'max_batch_size' not in Op: name=TRTEngineOp

biaochen opened this issue a year ago · 0 comments

I want to use tf-trt to optimize a tf2 model, and then serve with triton. But fail to serve the optimized tf-trt model. Following is the process:

following this tutorial (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#introduction), create a tf-trt optimized model
I use image nvcr.io/nvidia/tensorflow:22.07-tf2-py3 to run the code, and successfully created native model and converted model:

models/
├── native_saved_model
│   ├── assets
│   ├── keras_metadata.pb
│   ├── saved_model.pb
│   └── variables
│       ├── variables.data-00000-of-00001
│       └── variables.index
└── tftrt_saved_model
    ├── assets
    │   └── trt-serialized-engine.TRTEngineOp_000_000
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index

copy the native and converted model to a repos, and create the dir structure as triton wants:

├── mnist
│   ├── 1
│   │   └── model.savedmodel
│   │       ├── assets
│   │       ├── keras_metadata.pb
│   │       ├── saved_model.pb
│   │       └── variables
│   │           ├── variables.data-00000-of-00001
│   │           └── variables.index
│   └── config.pbtxt
└── mnist_trt
    ├── 1
    │   └── model.savedmodel
    │       ├── assets
    │       │   └── trt-serialized-engine.TRTEngineOp_000_000
    │       ├── saved_model.pb
    │       └── variables
    │           ├── variables.data-00000-of-00001
    │           └── variables.index
    └── config.pbtxt

the native model is copied under mnist/1/model.savedmodel, with config.pbtxt like this:

name: "mnist"
platform: "tensorflow_savedmodel"
max_batch_size : 0

the converted model is copied under mnist_trt/1/model.savedmodel, with config.pbtxt the same as above.

start the triton server within container nvcr.io/nvidia/tritonserver:22.07-py3, the log shows both models are loaded successfully.
try to infer. The client code likes this:

import tensorflow as tf
import numpy as np
import tritonclient.http as httpclient

# Setting up client
url = 'SERVER_IP:8000'
triton_client = httpclient.InferenceServerClient(url=url)
input1_shape = [1, 28, 28]
input1 = httpclient.InferInput("flatten_input", input1_shape, datatype="FP32")
input1_data = np.arange(1*28*28).reshape(1,28,28).astype(np.float32)
print('input1_data: ', input1_data)
input1.set_data_from_numpy(input1_data, binary_data=False)

test_output = httpclient.InferRequestedOutput("dense_1", binary_data=False, class_count=10)

# Querying the server
model_name="mnist"
results = triton_client.infer(model_name=model_name, inputs=[input1], outputs=[test_output])
print(results.as_numpy('dense_1'))

If the model_name is mnist, the infer succeeds, and print the predict result.

[['9575.137695:3' '9021.530273:2' '5957.917969:7' '-416.794525:5'
'-6797.246582:9' '-8895.693359:1' '-9928.074219:0' '-15507.916016:8'
'-22406.882812:6' '-29679.443359:4']]

However, after changing model_name to mnist_trt, the call fails, with error message:

tritonclient.utils.InferenceServerException: NodeDef mentions attr 'max_batch_size' not in Op<name=TRTEngineOp; signature=in_tensor: -> out_tensor:; attr=serialized_segment:string; attr=segment_func:func,default=[]; attr=InT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=OutT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=max_cached_engines_count:int,default=1; attr=workspace_size_bytes:int; attr=precision_mode:string,allowed=["FP32", "FP16", "INT8"]; attr=calibration_data:string,default=""; attr=use_calibration:bool,default=true; attr=input_shapes:list(shape),default=[]; attr=output_shapes:list(shape),default=[]; attr=segment_funcdef_name:string,default=""; attr=cached_engine_batches:list(int),default=[],min=0; attr=fixed_input_size:bool,default=true; attr=static_engine:bool,default=true>; NodeDef: {{node TRTEngineOp_000_000}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[PartitionedCall/PartitionedCall/TRTEngineOp_000_000]]

I guess maybe it's a version issue?