Can't use triton-inference-server to deploy the trt engine.

Question

Can't use triton-inference-server to deploy the trt engine.

JustinhoCHN opened this issue 4 years ago · 5 comments

Describe the bug
I want to deploy the trt engine with triton-inference-server, but it can't load the trt model.

To Reproduce

I've converted the trt engine file from mmdet model with docker container CLI:

mmdet2trt --fp16 cascade_rcnn_s101_fpn_syncbn-backbone+head_mstrain-range_1x_coco_fp16.py epoch_5.pth output.trt

So I got output.trt model file. And then I created the following directories to place my model:

models
    ├── big_model
    │   └── 1
    │       └── model.plan  # (rename from output.trt)
    └── libamirstan_plugin.so

and then want to deploy it with tritonserver:

docker run --rm --gpus device=3 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
  --env LD_PRELOAD=/models/libamirstan_plugin.so -p 8800:8000 -p 8801:8001 -p 8802:8002 \
  -v $(pwd):/models nvcr.io/nvidia/tritonserver:20.08-py3 \
  tritonserver --model-repository=/models --strict-model-config=false --log-verbose=1

However it cannot load the model:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.08 (build 15533555)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

I0224 07:22:55.402466 1 metrics.cc:184] found 1 GPUs supporting NVML metrics
I0224 07:22:55.408652 1 metrics.cc:193]   GPU 0: GeForce RTX 2080 Ti
I0224 07:22:55.409009 1 server.cc:119] Initializing Triton Inference Server
I0224 07:22:55.850319 1 pinned_memory_manager.cc:195] Pinned memory pool is created at '0x7f13f6000000' with size 268435456
I0224 07:22:55.852497 1 netdef_backend_factory.cc:46] Create NetDefBackendFactory
I0224 07:22:55.852517 1 plan_backend_factory.cc:48] Create PlanBackendFactory
I0224 07:22:55.852523 1 plan_backend_factory.cc:55] Registering TensorRT Plugins
I0224 07:22:55.852566 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1
I0224 07:22:55.852579 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I0224 07:22:55.852600 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1
I0224 07:22:55.852638 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I0224 07:22:55.852645 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I0224 07:22:55.852653 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I0224 07:22:55.852660 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1
I0224 07:22:55.852671 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I0224 07:22:55.852683 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I0224 07:22:55.852709 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I0224 07:22:55.852716 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I0224 07:22:55.852723 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
I0224 07:22:55.852734 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
I0224 07:22:55.852742 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I0224 07:22:55.852749 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I0224 07:22:55.852757 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I0224 07:22:55.852765 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I0224 07:22:55.852772 1 logging.cc:52] Registered plugin creator - ::Proposal version 1
I0224 07:22:55.852779 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I0224 07:22:55.852785 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I0224 07:22:55.852793 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I0224 07:22:55.852802 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I0224 07:22:55.852810 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I0224 07:22:55.852816 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I0224 07:22:55.852828 1 onnx_backend_factory.cc:53] Create OnnxBackendFactory
I0224 07:22:55.860046 1 libtorch_backend_factory.cc:53] Create LibTorchBackendFactory
I0224 07:22:55.860167 1 custom_backend_factory.cc:46] Create CustomBackendFactory
I0224 07:22:55.860172 1 backend_factory.h:44] Create TritonBackendFactory
I0224 07:22:55.860203 1 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory
I0224 07:22:55.860364 1 autofill.cc:142] TensorFlow SavedModel autofill: Internal: unable to autofill for 'big_model', unable to find savedmodel directory named 'model.savedmodel'
I0224 07:22:55.860396 1 autofill.cc:155] TensorFlow GraphDef autofill: Internal: unable to autofill for 'big_model', unable to find graphdef file named 'model.graphdef'
I0224 07:22:55.860420 1 autofill.cc:168] PyTorch autofill: Internal: unable to autofill for 'big_model', unable to find PyTorch file named 'model.pt'
I0224 07:22:55.860450 1 autofill.cc:180] Caffe2 NetDef autofill: Internal: unable to autofill for 'big_model', unable to find netdef files: 'model.netdef' and 'init_model.netdef'
I0224 07:22:56.123378 1 autofill.cc:376] failed to load /models/big_model/1/model.plan: Internal: onnx runtime error 1: /workspace/onnxruntime/onnxruntime/core/session/inference_session.cc:279 onnxruntime::InferenceSession::InferenceSession(const onnxruntime::SessionOptions&, const onnxruntime::Environment&, const void*, int) result was false. Could not parse model successfully while constructing the inference session

I0224 07:22:56.123459 1 autofill.cc:212] ONNX autofill: Internal: unable to autofill for 'big_model', unable to find onnx file
WARNING: Since openmp is enabled in this build, this API cannot be used to configure intra op num threads. Please use the openmp environment variables to control the number of threads.
E0224 07:23:19.480148 1 logging.cc:43] coreReadArchive.cpp (38) - Serialization Error in verifyHeader: 0 (Version tag does not match)
E0224 07:23:19.516318 1 logging.cc:43] INVALID_STATE: std::exception
E0224 07:23:19.516344 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
I0224 07:23:19.548534 1 autofill.cc:225] TensorRT autofill: Internal: unable to autofill for 'big_model', unable to find a compatible plan file.
W0224 07:23:19.548552 1 autofill.cc:265] Proceeding with simple config for now
I0224 07:23:19.548576 1 model_config_utils.cc:629] autofilled config: name: "big_model"

E0224 07:23:19.558529 1 model_repository_manager.cc:1633] unexpected platform type  for big_model
error: creating server: Internal - failed to load all models

Can anyone please point out what I'm missing? Thank you.

enviroment:

OS: [e.g. Ubuntu] 18.04
python_version: [e.g. 3.7] 3.6
pytorch_version: [e.g. 1.5.0] 1.6
cuda_version: [e.g. cuda-10.1] 10.2
cudnn_version: [e.g. 8.0.2.39] don't know, I use the docker provided.
mmdetection_version: [e.g. 2.3.0] 2.9.0

Answer 1 · 2021-04-13T14:45:47.000Z

@grimoire @daavoo @vedrusss,

Facing the same issue for the same problem statement. How can we run MMDetection (Or the TRT converted version) on Nvidia's Triton Inference Server?

If you can point me in the right direction, that also helps.

Answer 2 · 2021-04-14T02:13:52.000Z

@animikhaich Sorry I do not have experience on Triton inference Server.
If you want to deploy mmdetecton without TRT, you can launch an issue on mmdetecton repo. Hope they can help you.

Answer 3 · 2021-04-15T14:09:20.000Z

@grimoire Thanks for the quick response. I will do as adviced.

Answer 4 · 2021-06-22T19:10:58.000Z

@animikhaich Did you managed to run mmdet model in Triton?

I'm currently facing similar issue

E0622 19:01:19.653139 1 logging.cc:43] coreReadArchive.cpp (32) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0622 19:01:19.665819 1 logging.cc:43] INVALID_STATE: std::exception
E0622 19:01:19.665845 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
I0622 19:01:19.702299 1 autofill.cc:209] TensorRT autofill: Internal: unable to autofill for 'detr_kudaisaktasyn', unable to find a compatible plan file.

Answer 5 · 2021-06-23T13:57:34.000Z

@m-nny, I ended up converting Detectron2 ResNet50 FasterRCNN Weights to TorchScript and then hosting that to Triton Inference Server. That worked for me.

I did not continue with MMDetection after that, but I believe similar steps may lead to similar results. You can try it out and let us know.