Yutong-gannis/ETSAuto

构建 CLRNet 的 TensorRT 文件时报错

CrazyMustard-404 opened this issue · 8 comments

已成功生成llamas_dla34_tmp.onnx文件,但在构建engine文件时出错,报错如下:
[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph:
[6] Invalid Node - Pad_237
[shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensi
ons: dimensions were [-1,2])
[03/29/2023-16:55:56] [E] Failed to parse onnx file
[03/29/2023-16:55:56] [I] Finish parsing network model
[03/29/2023-16:55:56] [E] Parsing model failed
[03/29/2023-16:55:56] [E] Failed to create engine from model or file.
[03/29/2023-16:55:56] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine

这是所有报错?

这是所有报错?

抱歉,只贴了一部分,以下为完整输出:

&&&& RUNNING TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine
[03/29/2023-16:55:55] [I] === Model Options ===
[03/29/2023-16:55:55] [I] Format: ONNX
[03/29/2023-16:55:55] [I] Model: ./engines/llamas_dla34_tmp.onnx
[03/29/2023-16:55:55] [I] Output:
[03/29/2023-16:55:55] [I] === Build Options ===
[03/29/2023-16:55:55] [I] Max batch: explicit batch
[03/29/2023-16:55:55] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/29/2023-16:55:55] [I] minTiming: 1
[03/29/2023-16:55:55] [I] avgTiming: 8
[03/29/2023-16:55:55] [I] Precision: FP32
[03/29/2023-16:55:55] [I] LayerPrecisions:
[03/29/2023-16:55:55] [I] Calibration:
[03/29/2023-16:55:55] [I] Refit: Disabled
[03/29/2023-16:55:55] [I] Sparsity: Disabled
[03/29/2023-16:55:55] [I] Safe mode: Disabled
[03/29/2023-16:55:55] [I] DirectIO mode: Disabled
[03/29/2023-16:55:55] [I] Restricted mode: Disabled
[03/29/2023-16:55:55] [I] Build only: Disabled
[03/29/2023-16:55:55] [I] Save engine: ./engines/llamas_dla34.engine
[03/29/2023-16:55:55] [I] Load engine:
[03/29/2023-16:55:55] [I] Profiling verbosity: 0
[03/29/2023-16:55:55] [I] Tactic sources: Using default tactic sources
[03/29/2023-16:55:55] [I] timingCacheMode: local
[03/29/2023-16:55:55] [I] timingCacheFile:
[03/29/2023-16:55:55] [I] Input(s)s format: fp32:CHW
[03/29/2023-16:55:55] [I] Output(s)s format: fp32:CHW
[03/29/2023-16:55:55] [I] Input build shapes: model
[03/29/2023-16:55:55] [I] Input calibration shapes: model
[03/29/2023-16:55:55] [I] === System Options ===
[03/29/2023-16:55:55] [I] Device: 0
[03/29/2023-16:55:55] [I] DLACore:
[03/29/2023-16:55:55] [I] Plugins:
[03/29/2023-16:55:55] [I] === Inference Options ===
[03/29/2023-16:55:55] [I] Batch: Explicit
[03/29/2023-16:55:55] [I] Input inference shapes: model
[03/29/2023-16:55:55] [I] Iterations: 10
[03/29/2023-16:55:55] [I] Duration: 3s (+ 200ms warm up)
[03/29/2023-16:55:55] [I] Sleep time: 0ms
[03/29/2023-16:55:55] [I] Idle time: 0ms
[03/29/2023-16:55:55] [I] Streams: 1
[03/29/2023-16:55:55] [I] ExposeDMA: Disabled
[03/29/2023-16:55:55] [I] Data transfers: Enabled
[03/29/2023-16:55:55] [I] Spin-wait: Disabled
[03/29/2023-16:55:55] [I] Multithreading: Disabled
[03/29/2023-16:55:55] [I] CUDA Graph: Disabled
[03/29/2023-16:55:55] [I] Separate profiling: Disabled
[03/29/2023-16:55:55] [I] Time Deserialize: Disabled
[03/29/2023-16:55:55] [I] Time Refit: Disabled
[03/29/2023-16:55:55] [I] Inputs:
[03/29/2023-16:55:55] [I] === Reporting Options ===
[03/29/2023-16:55:55] [I] Verbose: Disabled
[03/29/2023-16:55:55] [I] Averages: 10 inferences
[03/29/2023-16:55:55] [I] Percentile: 99
[03/29/2023-16:55:55] [I] Dump refittable layers:Disabled
[03/29/2023-16:55:55] [I] Dump output: Disabled
[03/29/2023-16:55:55] [I] Profile: Disabled
[03/29/2023-16:55:55] [I] Export timing to JSON file:
[03/29/2023-16:55:55] [I] Export output to JSON file:
[03/29/2023-16:55:55] [I] Export profile to JSON file:
[03/29/2023-16:55:55] [I]
[03/29/2023-16:55:55] [I] === Device Information ===
[03/29/2023-16:55:55] [I] Selected Device: NVIDIA GeForce RTX 3090
[03/29/2023-16:55:55] [I] Compute Capability: 8.6
[03/29/2023-16:55:55] [I] SMs: 82
[03/29/2023-16:55:55] [I] Compute Clock Rate: 1.785 GHz
[03/29/2023-16:55:55] [I] Device Global Memory: 24575 MiB
[03/29/2023-16:55:55] [I] Shared Memory per SM: 100 KiB
[03/29/2023-16:55:55] [I] Memory Bus Width: 384 bits (ECC disabled)
[03/29/2023-16:55:55] [I] Memory Clock Rate: 9.751 GHz
[03/29/2023-16:55:55] [I]
[03/29/2023-16:55:55] [I] TensorRT version: 8.4.2
[03/29/2023-16:55:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +492, GPU +0, now: CPU 7429, GPU 1441 (MiB)
[03/29/2023-16:55:56] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +365, GPU +104, now: CPU 7984, GPU 1545 (MiB)
[03/29/2023-16:55:56] [I] Start parsing network model
[03/29/2023-16:55:56] [I] [TRT] ----------------------------------------------------------------
[03/29/2023-16:55:56] [I] [TRT] Input filename: ./engines/llamas_dla34_tmp.onnx
[03/29/2023-16:55:56] [I] [TRT] ONNX IR version: 0.0.6
[03/29/2023-16:55:56] [I] [TRT] Opset version: 11
[03/29/2023-16:55:56] [I] [TRT] Producer name: pytorch
[03/29/2023-16:55:56] [I] [TRT] Producer version: 1.9
[03/29/2023-16:55:56] [I] [TRT] Domain:
[03/29/2023-16:55:56] [I] [TRT] Model version: 0
[03/29/2023-16:55:56] [I] [TRT] Doc string:
[03/29/2023-16:55:56] [I] [TRT] ----------------------------------------------------------------
[03/29/2023-16:55:56] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to
INT32.
[03/29/2023-16:55:56] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[03/29/2023-16:55:56] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[03/29/2023-16:55:56] [E] Error[4]: [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape ten
sor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:773: While parsing node number 237 [Pad -> "496"]:
[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:774: --- Begin node ---
[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:775: input: "313"
input: "494"
input: "495"
output: "496"
name: "Pad_237"
op_type: "Pad"
attribute {
name: "mode"
s: "constant"
type: STRING
}

[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph:
[6] Invalid Node - Pad_237
[shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensi
ons: dimensions were [-1,2])
[03/29/2023-16:55:56] [E] Failed to parse onnx file
[03/29/2023-16:55:56] [I] Finish parsing network model
[03/29/2023-16:55:56] [E] Parsing model failed
[03/29/2023-16:55:56] [E] Failed to create engine from model or file.
[03/29/2023-16:55:56] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine

(ADAS) D:\Project\Self-driving-Truck-in-Euro-Truck-Simulator2-main>trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine
[03/29/2023-17:17:43] [I] === Model Options ===
[03/29/2023-17:17:43] [I] Format: ONNX
[03/29/2023-17:17:43] [I] Model: ./engines/llamas_dla34_tmp.onnx
[03/29/2023-17:17:43] [I] Output:
[03/29/2023-17:17:43] [I] === Build Options ===
[03/29/2023-17:17:43] [I] Max batch: explicit batch
[03/29/2023-17:17:43] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/29/2023-17:17:43] [I] minTiming: 1
[03/29/2023-17:17:43] [I] avgTiming: 8
[03/29/2023-17:17:43] [I] Precision: FP32
[03/29/2023-17:17:43] [I] LayerPrecisions:
[03/29/2023-17:17:43] [I] Calibration:
[03/29/2023-17:17:43] [I] Refit: Disabled
[03/29/2023-17:17:43] [I] Sparsity: Disabled
[03/29/2023-17:17:43] [I] Safe mode: Disabled
[03/29/2023-17:17:43] [I] DirectIO mode: Disabled
[03/29/2023-17:17:43] [I] Restricted mode: Disabled
[03/29/2023-17:17:43] [I] Build only: Disabled
[03/29/2023-17:17:43] [I] Save engine: ./engines/llamas_dla34.engine
[03/29/2023-17:17:43] [I] Load engine:
[03/29/2023-17:17:43] [I] Profiling verbosity: 0
[03/29/2023-17:17:43] [I] Tactic sources: Using default tactic sources
[03/29/2023-17:17:43] [I] timingCacheMode: local
[03/29/2023-17:17:43] [I] timingCacheFile:
[03/29/2023-17:17:43] [I] Input(s)s format: fp32:CHW
[03/29/2023-17:17:43] [I] Output(s)s format: fp32:CHW
[03/29/2023-17:17:43] [I] Input build shapes: model
[03/29/2023-17:17:43] [I] Input calibration shapes: model
[03/29/2023-17:17:43] [I] === System Options ===
[03/29/2023-17:17:43] [I] Device: 0
[03/29/2023-17:17:43] [I] DLACore:
[03/29/2023-17:17:43] [I] Plugins:
[03/29/2023-17:17:43] [I] === Inference Options ===
[03/29/2023-17:17:43] [I] Batch: Explicit
[03/29/2023-17:17:43] [I] Input inference shapes: model
[03/29/2023-17:17:43] [I] Iterations: 10
[03/29/2023-17:17:43] [I] Duration: 3s (+ 200ms warm up)
[03/29/2023-17:17:43] [I] Sleep time: 0ms
[03/29/2023-17:17:43] [I] Idle time: 0ms
[03/29/2023-17:17:43] [I] Streams: 1
[03/29/2023-17:17:43] [I] ExposeDMA: Disabled
[03/29/2023-17:17:43] [I] Data transfers: Enabled
[03/29/2023-17:17:43] [I] Spin-wait: Disabled
[03/29/2023-17:17:43] [I] Multithreading: Disabled
[03/29/2023-17:17:43] [I] CUDA Graph: Disabled
[03/29/2023-17:17:43] [I] Separate profiling: Disabled
[03/29/2023-17:17:43] [I] Time Deserialize: Disabled
[03/29/2023-17:17:43] [I] Time Refit: Disabled
[03/29/2023-17:17:43] [I] Inputs:
[03/29/2023-17:17:43] [I] === Reporting Options ===
[03/29/2023-17:17:43] [I] Verbose: Disabled
[03/29/2023-17:17:43] [I] Averages: 10 inferences
[03/29/2023-17:17:43] [I] Percentile: 99
[03/29/2023-17:17:43] [I] Dump refittable layers:Disabled
[03/29/2023-17:17:43] [I] Dump output: Disabled
[03/29/2023-17:17:43] [I] Profile: Disabled
[03/29/2023-17:17:43] [I] Export timing to JSON file:
[03/29/2023-17:17:43] [I] Export output to JSON file:
[03/29/2023-17:17:43] [I] Export profile to JSON file:
[03/29/2023-17:17:43] [I]
[03/29/2023-17:17:43] [I] === Device Information ===
[03/29/2023-17:17:43] [I] Selected Device: NVIDIA GeForce RTX 3090
[03/29/2023-17:17:43] [I] Compute Capability: 8.6
[03/29/2023-17:17:43] [I] SMs: 82
[03/29/2023-17:17:43] [I] Compute Clock Rate: 1.785 GHz
[03/29/2023-17:17:43] [I] Device Global Memory: 24575 MiB
[03/29/2023-17:17:43] [I] Shared Memory per SM: 100 KiB
[03/29/2023-17:17:43] [I] Memory Bus Width: 384 bits (ECC disabled)
[03/29/2023-17:17:43] [I] Memory Clock Rate: 9.751 GHz
[03/29/2023-17:17:43] [I]
[03/29/2023-17:17:43] [I] TensorRT version: 8.4.2
[03/29/2023-17:17:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +494, GPU +0, now: CPU 7658, GPU 1441 (MiB)
[03/29/2023-17:17:44] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +364, GPU +104, now: CPU 8216, GPU 1545 (MiB)
[03/29/2023-17:17:44] [I] Start parsing network model
[03/29/2023-17:17:44] [I] [TRT] ----------------------------------------------------------------
[03/29/2023-17:17:44] [I] [TRT] Input filename: ./engines/llamas_dla34_tmp.onnx
[03/29/2023-17:17:44] [I] [TRT] ONNX IR version: 0.0.6
[03/29/2023-17:17:44] [I] [TRT] Opset version: 11
[03/29/2023-17:17:44] [I] [TRT] Producer name: pytorch
[03/29/2023-17:17:44] [I] [TRT] Producer version: 1.9
[03/29/2023-17:17:44] [I] [TRT] Domain:
[03/29/2023-17:17:44] [I] [TRT] Model version: 0
[03/29/2023-17:17:44] [I] [TRT] Doc string:
[03/29/2023-17:17:44] [I] [TRT] ----------------------------------------------------------------
[03/29/2023-17:17:44] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to
INT32.
[03/29/2023-17:17:44] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[03/29/2023-17:17:44] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[03/29/2023-17:17:44] [E] Error[4]: [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape ten
sor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
[03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:773: While parsing node number 237 [Pad -> "496"]:
[03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:774: --- Begin node ---
[03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:775: input: "313"
input: "494"
input: "495"
output: "496"
name: "Pad_237"
op_type: "Pad"
attribute {
name: "mode"
s: "constant"
type: STRING
}

[03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph:
[6] Invalid Node - Pad_237
[shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensi
ons: dimensions were [-1,2])
[03/29/2023-17:17:44] [E] Failed to parse onnx file
[03/29/2023-17:17:44] [I] Finish parsing network model
[03/29/2023-17:17:44] [E] Parsing model failed
[03/29/2023-17:17:44] [E] Failed to create engine from model or file.
[03/29/2023-17:17:44] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine

@CrazyMustard-404
先诊断一下onnx文件
polygraphy surgeon sanitize your_path/tusimple_r18.onnx --fold-constants --output your_path/tusimple_r18.onnx

这个诊断后看起来是正常的。
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: D:\anaconda\envs\ADAS\Scripts\polygraphy surgeon sanitize engines/llamas_dla34.onnx --fold-constants --output output/34.onnx
[I] Inferring shapes in the model with onnxruntime.tools.symbolic_shape_infer.
Note: To force Polygraphy to use onnx.shape_inference instead, set allow_onnxruntime=False or use the --no-onnxruntime-shape-inference command-line option.
[I] Loading model: D:\Project\Self-driving-Truck-in-Euro-Truck-Simulator2-main\engines\llamas_dla34.onnx
[I] Original Model:
Name: torch-jit-export | ONNX Opset: 11

---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 800)]}

---- 1 Graph Output(s) ----
{3076 [dtype=float32, shape=(1, 192, 78)]}

---- 222 Initializer(s) ----

---- 2603 Node(s) ----

[I] Folding Constants | Pass 1
[E] Module: 'onnx_graphsurgeon' version '0.3.12' is installed, but version '>=0.3.21' is required.
Please install the required version or set POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.
Attempting to continue with the currently installed version of this module, but note that this may cause errors!
[W] Constant folding pass failed. Skipping subsequent passes.
Note: Error was:
fold_constants() got an unexpected keyword argument 'size_threshold'
[I] Saving ONNX model to: output/34.onnx
[I] New Model:
Name: torch-jit-export | ONNX Opset: 11

---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 800)]}

---- 1 Graph Output(s) ----
{3076 [dtype=float32, shape=(1, 192, 78)]}

---- 222 Initializer(s) ----

---- 2603 Node(s) ----

[I] PASSED | Runtime: 1.856s | Command: D:\anaconda\envs\ADAS\Scripts\polygraphy surgeon sanitize engines/llamas_dla34.onnx --fold-constants --output output/34.onnx

@CrazyMustard-404 用这个34.onnx转trt试试

@CrazyMustard-404 用这个34.onnx转trt试试

试了,还是报相同错误。

@CrazyMustard-404 可以先去掉车道线检测试试

ETSAuto/script/main.py

Lines 78 to 80 in 8f8e367

#im1, bev_lanes = clrnet.forward(cv2.resize(img, (1280, 720)), nav_line, im1, CAM) # 传入RGB图像
# 车辆前向物体检测线(暂时替代车道线)
bev_lanes = FCW(nav_line, truck.speed)

@CrazyMustard-404 可以先去掉车道线检测试试

ETSAuto/script/main.py

Lines 78 to 80 in 8f8e367

#im1, bev_lanes = clrnet.forward(cv2.resize(img, (1280, 720)), nav_line, im1, CAM) # 传入RGB图像
# 车辆前向物体检测线(暂时替代车道线)
bev_lanes = FCW(nav_line, truck.speed)

谢谢UP,问题已解决,是CUDA及tensorrt版本的问题。最终解决版本:CUDA 11.8 、 CUDNN 8.8.0.121_cuda11、 TensorRT-8.5.3.1、 torch==1.13.1+cu117.