大佬大佬,求助求助,帮看看报错

Question

大佬大佬,求助求助,帮看看报错

Closed this issue 2 years ago · 43 comments

I:\fsd\fsd\venv\Scripts\python.exe script\main.py
[01/23/2023-20:24:19] [TRT] [E] 6: The engine plan file is not compatible with this version of TensorRT, expecting library version 8.4.2.4 got 8.4.0.6, please rebuild.
[01/23/2023-20:24:19] [TRT] [E] 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
Traceback (most recent call last):
File "script\main.py", line 63, in
clrnet = CLRNet(llamas_engine_path)
File "I:\fsd\fsd\Perception\LaneDetection\clrnet_trt.py", line 75, in init
self.context = self.engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

进程已结束,退出代码1

Answer 1 · 2023-01-23T12:39:53.000Z

@nizhihao7 engine文件需要自己用onnx文件生成，不能用我上传的

Answer 2 · 2023-01-23T12:48:40.000Z

@Yutong-gannis
那啥我有点小白,跟着大佬走的,有些问题可能搞不定.要麻烦了~
能弱弱的问问那个onnx文件在哪生成 Onz

Answer 3 · 2023-01-23T13:01:13.000Z

@Yutong-gannis 大佬是在只下载.onnx 的文件到engines 文件夹,然后参照构建 TensorRT 文件那个步骤自己转化出.engine的文件是不?

Answer 4 · 2023-01-23T13:13:31.000Z

@nizhihao7 是的，onnx文件下载仓库里的就行，按照readme的方法转换为engine文件

Answer 5 · 2023-01-23T13:16:19.000Z

@Yutong-gannis 大佬转化完了,您再帮我看看~

I:\fsd\fsd\venv\Scripts\python.exe script\main.py
Traceback (most recent call last):
File "script\main.py", line 63, in
clrnet = CLRNet(llamas_engine_path)
File "I:\fsd\fsd\Perception\LaneDetection\clrnet_trt.py", line 73, in init
with open(engine_path, "rb") as f, trt.Runtime(self.logger) as runtime:
FileNotFoundError: [Errno 2] No such file or directory: 'I:\fsd\fsd\Engines\llamas_dla34.engine'

进程已结束,退出代码1

Answer 6 · 2023-01-23T13:20:56.000Z

@nizhihao7 没有找到文件，是不是文件位置犯错了，或者直接在clrnet = CLRNet()里填绝对路径

Answer 7 · 2023-01-23T13:24:06.000Z

@Yutong-gannis 我去路径里看了看,我好像只转出了llamas_dla34_tmp.onnx.这个llamas_dla34.engine它提示失败.

Answer 8 · 2023-01-23T13:26:36.000Z

@nizhihao7 看一下所有报错

Answer 9 · 2023-01-23T13:31:06.000Z

@Yutong-gannis 您看看~!
(venv) PS I:\fsd\fsd> polygraphy surgeon sanitize ./engines/llamas_dla34.onnx --fold-constants --output ./engines/llamas_dla34_tmp.onnx
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] RUNNING | Command: I:\fsd\fsd\venv\Scripts\polygraphy surgeon sanitize ./engines/llamas_dla34.onnx --fold-constants --output ./engines/llamas_dla34_tmp.onnx
[I] Inferring shapes in the model with onnxruntime.tools.symbolic_shape_infer.
Note: To force Polygraphy to use onnx.shape_inference instead, set allow_onnxruntime=False or use the --no-onnxruntime-shape-inference command-line option.
[I] Loading model: I:\fsd\fsd\engines\llamas_dla34.onnx
[I] Original Model:
Name: torch-jit-export | ONNX Opset: 11

---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 800)]}

---- 1 Graph Output(s) ----
{3076 [dtype=float32, shape=(1, 192, 78)]}     

---- 222 Initializer(s) ----

---- 2603 Node(s) ----

[I] Folding Constants | Pass 1
[E] Module: 'onnx_graphsurgeon' version '0.3.12' is installed, but version '>=0.3.21' is required.
Please install the required version or set POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.
Attempting to continue with the currently installed version of this module, but note that this may cause errors!
[W] Constant folding pass failed. Skipping subsequent passes.
Note: Error was:
fold_constants() got an unexpected keyword argument 'size_threshold'
[I] Saving ONNX model to: ./engines/llamas_dla34_tmp.onnx
[I] New Model:
Name: torch-jit-export | ONNX Opset: 11

---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 800)]}

---- 1 Graph Output(s) ----
{3076 [dtype=float32, shape=(1, 192, 78)]}

---- 222 Initializer(s) ----

---- 2603 Node(s) ----

[I] PASSED | Runtime: 5.488s | Command: I:\fsd\fsd\venv\Scripts\polygraphy surgeon sanitize ./engines/llamas_dla34.onnx --fold-constants --output ./engines/llamas_dla34_tmp.onnx
(venv) PS I:\fsd\fsd> trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8402] # C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\trtexec.exe --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine
[01/23/2023-21:28:30] [I] === Model Options ===
[01/23/2023-21:28:30] [I] Format: ONNX
[01/23/2023-21:28:30] [I] Model: ./engines/llamas_dla34_tmp.onnx
[01/23/2023-21:28:30] [I] Output:
[01/23/2023-21:28:30] [I] === Build Options ===
[01/23/2023-21:28:30] [I] Max batch: explicit batch
[01/23/2023-21:28:30] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[01/23/2023-21:28:30] [I] minTiming: 1
[01/23/2023-21:28:30] [I] avgTiming: 8
[01/23/2023-21:28:30] [I] Precision: FP32
[01/23/2023-21:28:30] [I] LayerPrecisions:
[01/23/2023-21:28:30] [I] Calibration:
[01/23/2023-21:28:30] [I] Refit: Disabled
[01/23/2023-21:28:30] [I] Sparsity: Disabled
[01/23/2023-21:28:30] [I] Safe mode: Disabled
[01/23/2023-21:28:30] [I] DirectIO mode: Disabled
[01/23/2023-21:28:30] [I] Restricted mode: Disabled
[01/23/2023-21:28:30] [I] Build only: Disabled
[01/23/2023-21:28:30] [I] Save engine: ./engines/llamas_dla34.engine
[01/23/2023-21:28:30] [I] Load engine:
[01/23/2023-21:28:30] [I] Profiling verbosity: 0
[01/23/2023-21:28:30] [I] Tactic sources: Using default tactic sources
[01/23/2023-21:28:30] [I] timingCacheMode: local
[01/23/2023-21:28:30] [I] timingCacheFile:
[01/23/2023-21:28:30] [I] Input(s)s format: fp32:CHW
[01/23/2023-21:28:30] [I] Output(s)s format: fp32:CHW
[01/23/2023-21:28:30] [I] Input build shapes: model
[01/23/2023-21:28:30] [I] Input calibration shapes: model
[01/23/2023-21:28:30] [I] === System Options ===
[01/23/2023-21:28:30] [I] Device: 0
[01/23/2023-21:28:30] [I] DLACore:
[01/23/2023-21:28:30] [I] Plugins:
[01/23/2023-21:28:30] [I] === Inference Options ===
[01/23/2023-21:28:30] [I] Batch: Explicit
[01/23/2023-21:28:30] [I] Input inference shapes: model
[01/23/2023-21:28:30] [I] Iterations: 10
[01/23/2023-21:28:30] [I] Duration: 3s (+ 200ms warm up)
[01/23/2023-21:28:30] [I] Sleep time: 0ms
[01/23/2023-21:28:30] [I] Idle time: 0ms
[01/23/2023-21:28:30] [I] Streams: 1
[01/23/2023-21:28:30] [I] ExposeDMA: Disabled
[01/23/2023-21:28:30] [I] Data transfers: Enabled
[01/23/2023-21:28:30] [I] Spin-wait: Disabled
[01/23/2023-21:28:30] [I] Multithreading: Disabled
[01/23/2023-21:28:30] [I] CUDA Graph: Disabled
[01/23/2023-21:28:30] [I] Separate profiling: Disabled
[01/23/2023-21:28:30] [I] Time Deserialize: Disabled
[01/23/2023-21:28:30] [I] Time Refit: Disabled
[01/23/2023-21:28:30] [I] Inputs:
[01/23/2023-21:28:30] [I] === Reporting Options ===
[01/23/2023-21:28:30] [I] Verbose: Disabled
[01/23/2023-21:28:30] [I] Averages: 10 inferences
[01/23/2023-21:28:30] [I] Percentile: 99
[01/23/2023-21:28:30] [I] Dump refittable layers:Disabled
[01/23/2023-21:28:30] [I] Dump output: Disabled
[01/23/2023-21:28:30] [I] Profile: Disabled
[01/23/2023-21:28:30] [I] Export timing to JSON file:
[01/23/2023-21:28:30] [I] Export output to JSON file:
[01/23/2023-21:28:30] [I] Export profile to JSON file:
[01/23/2023-21:28:30] [I]
[01/23/2023-21:28:30] [I] === Device Information ===
[01/23/2023-21:28:30] [I] Selected Device: NVIDIA GeForce GTX 1070 Ti
[01/23/2023-21:28:30] [I] Compute Capability: 6.1
[01/23/2023-21:28:30] [I] SMs: 19
[01/23/2023-21:28:30] [I] Compute Clock Rate: 1.683 GHz
[01/23/2023-21:28:30] [I] Device Global Memory: 8191 MiB
[01/23/2023-21:28:30] [I] Shared Memory per SM: 96 KiB
[01/23/2023-21:28:30] [I] Memory Bus Width: 256 bits (ECC disabled)
[01/23/2023-21:28:30] [I] Memory Clock Rate: 4.004 GHz
[01/23/2023-21:28:30] [I]
[01/23/2023-21:28:30] [I] TensorRT version: 8.4.2
[01/23/2023-21:28:31] [I] [TRT] [MemUsageChange] Init CUDA: CPU +234, GPU +0, now: CPU 11874, GPU 1058 (MiB)
[01/23/2023-21:28:31] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2, GPU +0, now: CPU 11934, GPU 1058 (MiB)
[01/23/2023-21:28:31] [I] Start parsing network model
[01/23/2023-21:28:31] [I] [TRT] ----------------------------------------------------------------
[01/23/2023-21:28:31] [I] [TRT] Input filename: ./engines/llamas_dla34_tmp.onnx
[01/23/2023-21:28:31] [I] [TRT] ONNX IR version: 0.0.6
[01/23/2023-21:28:31] [I] [TRT] Opset version: 11
[01/23/2023-21:28:31] [I] [TRT] Producer name: pytorch
[01/23/2023-21:28:31] [I] [TRT] Producer version: 1.9
[01/23/2023-21:28:31] [I] [TRT] Domain:
[01/23/2023-21:28:31] [I] [TRT] Model version: 0
[01/23/2023-21:28:31] [I] [TRT] Doc string:
[01/23/2023-21:28:31] [I] [TRT] ----------------------------------------------------------------
[01/23/2023-21:28:31] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/23/2023-21:28:31] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[01/23/2023-21:28:31] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped
[01/23/2023-21:28:31] [E] Error[4]: [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2]
)
[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:773: While parsing node number 237 [Pad -> "496"]:
[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:774: --- Begin node ---
[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:775: input: "313"
input: "494"
input: "495"
output: "496"
name: "Pad_237"
op_type: "Pad"
attribute {
name: "mode"
s: "constant"
type: STRING
}

[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph:
[6] Invalid Node - Pad_237
[shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2])
[01/23/2023-21:28:31] [E] Failed to parse onnx file
[01/23/2023-21:28:31] [I] Finish parsing network model
[01/23/2023-21:28:31] [E] Parsing model failed
[01/23/2023-21:28:31] [E] Failed to create engine from model or file.
[01/23/2023-21:28:31] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8402] # C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\trtexec.exe --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine
(venv) PS I:\fsd\fsd>

Answer 10 · 2023-01-23T13:40:08.000Z

@nizhihao7 不用polygraphy surgeon sanitize 试试行不行。或者改用tensorrt8.4.2.4。

Answer 11 · 2023-01-23T14:09:48.000Z

@Yutong-gannis 喔嚯~~搞不定了 = =233 .我看了一下[CLRNet-onnxruntime-and-tensorrt-demo]也是只能转出tmp.onnx~~转第二个的时候也是一样失败~ Onz

Answer 12 · 2023-01-23T14:20:06.000Z

@nizhihao7 试试直接转engine，不用polygraphy surgeon，还不行的话，考虑换成tensorrt8.4.2.4

Answer 13 · 2023-01-23T14:32:04.000Z

@Yutong-gannis 直接转换也失败,我现在用的好像就是8424

Answer 14 · 2023-01-23T14:34:57.000Z

@ywjno 这个问题能解决吗

Answer 15 · 2023-01-23T14:35:38.000Z

使用8.5.2.2可以成功转换

Answer 16 · 2023-01-23T14:40:54.000Z

@Dameng23333 好的好的,我试试看.只升级TensorRT到8522其他还有改变不?~

Answer 17 · 2023-01-23T14:44:25.000Z

@nizhihao7 我和你版本一样的，不太像是版本问题。polygraphy surgeon sanitize model.onnx --fold-constants -o folded.onnx加这个fold constants试试

Answer 18 · 2023-01-23T14:54:52.000Z

@Yutong-gannis
polygraphy surgeon sanitize ./engines/llamas_dla34.onnx --fold-constants -o ./engines/llamas_dla34.engine
这样可以直接转出来engine,是这样转不?

Answer 19 · 2023-01-23T15:01:14.000Z

@nizhihao7 不能直接转，先优化onnx再用trtexec转

Answer 20 · 2023-01-23T15:03:14.000Z

@Yutong-gannis 那好像还是不行,只能这样转,多转一次第二次都是失败的.我升级8522看看~

Answer 21 · 2023-01-23T15:12:42.000Z

@Yutong-gannis 8522可以转~~~

这是转完之后的

I:\fsd\fsd\venv\Scripts\python.exe script\main.py
[01/23/2023-23:10:43] [TRT] [E] 1: [stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::40] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 213, Serialized Engine Version: 232)
[01/23/2023-23:10:43] [TRT] [E] 4: [runtime.cpp::nvinfer1::Runtime::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
Traceback (most recent call last):
File "script\main.py", line 63, in
clrnet = CLRNet(llamas_engine_path)
File "I:\fsd\fsd\Perception\LaneDetection\clrnet_trt.py", line 75, in init
self.context = self.engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'

进程已结束,退出代码1

Answer 22 · 2023-01-23T15:33:01.000Z

@nizhihao7 看报错好像是没有修改环境中的tensorrt，你只是转换用的8.5吗

Answer 23 · 2023-01-23T15:54:17.000Z

tensorrt 8.5.x 的版本转换出来的engine文件，虽然能转换但是程序是不支持的。虽然文档没写不过我本地用的是 8.4.3.1 版本转换的engine文件。
tensorrt 不仅whl的版本要一样，环境变量里面的Path也要相同。
默认自带的engine文件貌似是8.4.0.6，在20系显卡下编译（不知道是不是2080）。如果硬件相同那可以用。

虽然我不会 ML 方面的东西不过这个项目的环境我基本都踩了一遍坑，建议仔细看一遍 Readme。

话说安装了权重转换所依赖的包了没？

[E] Module: 'onnx_graphsurgeon' version '0.3.12' is installed, but version '>=0.3.21' is required.
Please install the required version or set POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.

这里错误提示说版本不对，用 python -m pip list 查看安装的 onnx-graphsurgeon版本，我本地的是 0.3.25。

Answer 24 · 2023-01-23T16:05:47.000Z

8.4.2.4、8.4.3.1、8.5.2.2我都试过了，8.4.x版本均会报错，即题主给的控制台输出的报错
[01/23/2023-21:28:31] [E] Error[4]: [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensions: dimensions were [-1,2]
)
[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:773: While parsing node number 237 [Pad -> "496"]:
[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:774: --- Begin node ---
[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:775: input: "313"
input: "494"
input: "495"
output: "496"
name: "Pad_237"
op_type: "Pad"
attribute {
name: "mode"
s: "constant"
type: STRING
}

[01/23/2023-21:28:31] [E] [TRT] ModelImporter.cpp:776: --- End node ---

Answer 25 · 2023-01-23T16:11:49.000Z

trtexec.exe 这个文件正常的应该是在 $TENSORRT_PATH/bin 下，这是手动把这个文件放到了$CUDA_PATH/bin 文件夹下了么？(影响不大单纯的问问)
$TENSORRT_PATH/lib添加到环境变量 Path 里面了么？
修改完环境变量 Path 之后重启 Powershell 了么？

Answer 26 · 2023-01-23T16:40:55.000Z

[I] Folding Constants | Pass 1
[E] Module: 'onnx_graphsurgeon' version '0.3.12' is installed, but version '>=0.3.21' is required.
Please install the required version or set POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.
Attempting to continue with the currently installed version of this module, but note that this may cause errors!
[W] Constant folding pass failed. Skipping subsequent passes.
Note: Error was:
fold_constants() got an unexpected keyword argument 'size_threshold'
8.4.x版本转换失败可能是由于onnx_graphsurgeon版本过低造成的。
先 pip install nvidia-pyindex，然后再pip install onnx-graphsurgeon==0.3.21
安装完成后重新转换即可

Answer 27 · 2023-01-23T16:41:56.000Z

好的好的,我先升级一下onnx_graphsurgeon.我找半天怎么升级= =

Answer 28 · 2023-01-23T16:45:42.000Z

把 安装权重转换所依赖的包 的那行命令运行一下就能安装转换engine文件所依赖的包。

要是最后没有提示任何包安装成功，运行 python -m pip uninstall -r ./tools/requirements.txt -y 后再次运行安装依赖包的命令。

Answer 29 · 2023-01-23T16:56:17.000Z

就是onnx_graphsurgeon版本太低- -.升级一下就转过去了~

Answer 30 · 2023-01-23T16:59:07.000Z

@ywjno @Yutong-gannis @Dameng23333 好像成了~~多谢大佬们.但是分辨率设置多少比较好~~?

Answer 31 · 2023-01-23T18:17:19.000Z

各位大佬,现在有点游戏设置的小问题.整个车子偏右几乎压着右边的白线.而且车道经常识别不到- -.游戏设置能给个截图参考一下不?

Answer 32 · 2023-01-23T18:36:48.000Z

偏右好了. 游戏窗口没摆好= =.但是白线识别还是不怎么好~~经常看不见= =是车辆灯光问题还是游戏设置的问题? 白天有些时候也是识别不到,尴尬~~

Answer 33 · 2023-01-24T03:05:34.000Z

@ywjno 如果比我的视频效果差，可能是精度问题，或者你可以在clrnet_trt.py里调一下conf_threshold

Answer 34 · 2023-01-24T08:38:13.000Z

识别窗口的上面能认出东西，下面判断行驶方向的不行，即使修改了 clrnet_trt.py 的 L106行也不行，话说self.conf_threshold = 0.6这句话在L80的循环里面重复赋相同的值是就是那么写的么？

Answer 35 · 2023-01-24T08:53:28.000Z

@ywjno @nizhihao7 导航线的话需要装mod,写在readme里了

Answer 36 · 2023-01-24T12:33:28.000Z

那两个 mod 已经装了的

Answer 37 · 2023-01-24T13:13:19.000Z

mod已经安装了，上高架的那个环线有点识别不到，导航线会识别到高架上那条然后突然转弯撞墙，这个是没识别到车道线的问题么？

Answer 38 · 2023-01-24T15:57:57.000Z

@nizhihao7 这个可能要后面单独做一个场景

Answer 39 · 2023-01-24T16:25:05.000Z

@ywjno 可能确实只能这样，技术路线没选好，可能分割可以更稳定一点

Answer 40 · 2023-01-25T06:36:14.000Z

@Yutong-gannis 安装numpy==1.23.5 lap 显示这个python setup.py bdist_wheel did not run successfully.无法解决帮我看看~~~

Answer 41 · 2023-01-25T09:08:29.000Z

@panyuab 更新setuptools版本试试

Answer 42 · 2023-01-25T09:29:07.000Z

@Yutong-gannis setuptools是最新版本

Answer 43 · 2023-01-25T14:06:07.000Z

看看这个能不能解决问题

有新问题单独开issue吧，这个感觉可以关掉了。。。