Yolov8 model inference problem with GPU ( net.forward() )

Question

Yolov8 model inference problem with GPU ( net.forward() )

quentinblechet opened this issue a year ago · 14 comments

Hello,

I'm currently using your code inside a ROS2 node and i'm doing inference with yolov8. It seems that i have a problem with the forward fucntion. When i use CUDA BACKEND the output of forward give me the right size but the data are always 0 where i have normal value when i run it on CPU.

std::vector<cv::Mat> outputs;
net.forward(outputs, net.getUnconnectedOutLayersNames());

int rows = outputs[0].size[1];
int dimensions = outputs[0].size[2];

std::cout << rows << " / " << dimensions << " / " << *(float *)outputs[0].data << std::endl;

Those are screenshots while running my code with CPU and GPU.

Answer 1 · 2023-08-22T09:23:32.000Z

Hi @quentinblechet I haven't seen that issue before and I'm not really sure how would you go about fixing it really...

I mean something that stands out as obvious to me would be to check:

Your Nvidia GPU drivers as well as your OpenCV (CUDA, cuDNN) build.
Make sure that ROS has access to those drivers etc...
Make sure that you've exported your ONNX model correctly (Although since it's working on CPU it appears to be just fine).
Try it out with different models such as a small/medium yolov5 model versus a yolov8 model. (Try and isolate that the issue is purely to do with the GPU).
Try asking ChatGPT to see if it can come up with anything in regards to your problem...

In case you've missed it there's the OpenCV build outline:

All the best and good luck 🚀

Answer 2 · 2023-08-22T09:59:13.000Z

Thx for this quick answer.

Answer 3 · 2023-08-22T09:59:38.000Z

I have Opencv 4.8.0. Could that be the problem?

Answer 4 · 2023-08-22T10:02:28.000Z

@quentinblechet In theory it shouldn't matter at all but only assuming that everything is building properly and whatnot. Can you confirm your build output?

Answer 5 · 2023-08-22T11:11:19.000Z

General configuration for OpenCV 4.8.0 =====================================
Version control: unknown

Extra modules:
Location (extra): /home/quentin.blechet/opencv_contrib/modules
Version control (extra): unknown

Platform:
Timestamp: 2023-08-10T08:41:16Z
Host: Linux 5.15.0-78-generic x86_64
CMake: 3.26.4
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: RELEASE

CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (18 files): + SSSE3 SSE4_1
SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (37 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX

C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: /usr/bin/c++ (ver 9.4.0)
C++ flags (Release): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -ffast-math -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
Linker flags (Debug): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined
ccache: NO
Precompiled headers: NO
Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu
3rdparty dependencies:

OpenCV modules:
To be built: alphamat aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: world
Disabled by dependency: -
Unavailable: cvv java julia matlab ovis python2 sfm viz
Applications: tests perf_tests examples apps
Documentation: NO
Non-free algorithms: YES

GUI: GTK3
GTK+: YES (ver 3.24.20)
GThread : YES (ver 2.64.6)
GtkGlExt: NO
VTK support: NO

Media I/O:
ZLib: /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11)
JPEG: /usr/lib/x86_64-linux-gnu/libjpeg.so (ver 80)
WEBP: /usr/lib/x86_64-linux-gnu/libwebp.so (ver encoder: 0x020e)
PNG: /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.6.37)
TIFF: /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
JPEG 2000: OpenJPEG (ver 2.3.1)
OpenEXR: /usr/lib/x86_64-linux-gnu/libImath.so /usr/lib/x86_64-linux-gnu/libIlmImf.so /usr/lib/x86_64-linux-gnu/libIex.so /usr/lib/x86_64-linux-gnu/libHalf.so /usr/lib/x86_64-linux-gnu/libIlmThread.so (ver 2_3)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES

Video I/O:
DC1394: YES (2.2.5)
FFMPEG: YES
avcodec: YES (58.54.100)
avformat: YES (58.29.100)
avutil: YES (56.31.100)
swscale: YES (5.5.100)
avresample: YES (4.0.0)
GStreamer: YES (1.16.3)
v4l/v4l2: YES (linux/videodev2.h)

Parallel framework: pthreads

Trace: YES (with Intel ITT)

Other third-party libraries:
Intel IPP: 2021.8 [2021.8.0]
at: /home/quentin.blechet/opencv/build/3rdparty/ippicv/ippicv_lnx/icv
Intel IPP IW: sources (2021.8.0)
at: /home/quentin.blechet/opencv/build/3rdparty/ippicv/ippicv_lnx/iw
VA: NO
Lapack: NO
Eigen: YES (ver 3.3.7)
Custom HAL: NO
Protobuf: build (3.19.1)
Flatbuffers: builtin/3rdparty (23.5.9)

NVIDIA CUDA: YES (ver 11.6, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 75
NVIDIA PTX archs:

cuDNN: YES (ver 8.9.3)

OpenCL: YES (no extra features)
Include path: /home/quentin.blechet/opencv/3rdparty/include/opencl/1.2
Link libraries: Dynamic load

Python 3:
Interpreter: /usr/bin/python3 (ver 3.8.10)
Libraries: /usr/lib/x86_64-linux-gnu/libpython3.8.so (ver 3.8.10)
numpy: /home/quentin.blechet/.local/lib/python3.8/site-packages/numpy/core/include (ver 1.24.3)
install path: lib/python3.8/site-packages/cv2/python-3.8

Python (for build): /usr/bin/python2.7

Java:
ant: NO
Java: NO
JNI: NO
Java wrappers: NO
Java tests: NO

Install to: /usr/local

Answer 6 · 2023-08-22T11:21:20.000Z

@quentinblechet I'm not seeing anything obvious here... Other than it's strange that it's saying 'Version control: unknown' and 'Version control (extra): unknown'.

Can you confirm that you've used 'git clone ...' opposed to downloading the .zip file?

It's important to use 'git clone ...' and then 'git checkout 4.8.0' because when you run your cmake (I use cmake-gui for convenience) you'll see that it'll automatically start downloading certain packages as it goes.

I'm not saying that this is the case with your build by the way I'm just saying that it's weird that it says unknown.

Other than that yeah I'm not too sure... It's odd that nothing is crashing/failing alright... Just to confirm as well, you do have an Nvidia based GPU, right?

Answer 7 · 2023-08-22T11:24:20.000Z

Yes i have used git clone.
And yes i have a Nvidia based GPU.
I clearly don't see where the problem can came from.

Answer 8 · 2023-08-22T11:26:46.000Z

@quentinblechet Yeah, my apologies for being unable to provide more for you but I don't know, I haven't seen/dealt with this particular issue and I'm not really sure how to fix it. I mean the only thing that I could say really would be to perhaps re-build but that's a bit of a long-winded solution that may not even solve anything...

Do let me know if you figure this out at some point, good luck! 🚀

Answer 9 · 2023-08-22T12:58:26.000Z

Just as an extra information. It works perfectly with yolov5s.

Answer 10 · 2023-08-22T13:36:01.000Z

@quentinblechet That's rather strange... Have you tried exporting the yolov8 model ONNX yourself? If I remember correctly you should set the 'opset=12'

Answer 11 · 2023-08-22T13:37:12.000Z

@quentinblechet Otherwise it might be the transpose function somehow in that yolov8 has it's output vectors order swapped compared to v5.

Answer 12 · 2023-09-03T22:07:38.000Z

Hello @quentinblechet and @JustasBart , just wanted to let you know that I am using OpenCV 4.8, and I am facing the same issue. It works on CPU but doesn't work on GPU. I am using your shell script to export to ONNX format, and it has the opset=12 argument.

I am running on Ubuntu 20.04 with an RTX 3070. YOLOv5 works well on both CPU and GPU, but YOLOv8 works on CPU only.

Answer 13 · 2023-09-04T05:51:58.000Z

Thanks for your input @matheusbg8, it's beginning to sound like maybe something had changed with OpenCV 4.8.0, I'm actually having troubles building it and setting it up fully as I was used to from before as well... I'll check in if I have any updates on it... In the meantime though OpenCV 4.7.0 should in theory work better as that's what I was using before.

Answer 14 · 2023-09-07T08:05:33.000Z

i had same issue with OpenCV 4.8.0. so i had changed to 4.7.0 it perfectly works with YOLOv8