CESNET/GPUJPEG

Trouble with cudaStreamSynchronize

Closed this issue · 7 comments

Hi @MartinPulec,

I updated my version of GPUJPEG and do a clean rebuild but now I have a weird trouble on my linux 20.04
I am just wondering if you have an idea why its happen ?

/usr/bin/ld:gpujpeg/linux/lib/libgpujpeg.a(gpujpeg_decoder.c.o): undefined reference to symbol 'cudaStreamSynchronize@@libcudart.so.11.0'
/usr/bin/ld: /usr/local/cuda-11.2/lib64/libcudart.so: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

How do you compile the library/link with it? Since you are using the static library, I'd say that -lcudart is missing from your app link command but not sure because of the second line of your output.

Just for my curiosity - is there any reason not to use system CUDA 11.2 while it is already distributed with Ubuntu 20.04?

Not sure I always install the cuda toolkit 11.2 and then use this one for build my project.
But its weird this problem appear suddenly.

BTW another issue happen with the latest code. When building statically the gpujpeg you include the main.c as part of the static library.
Later if you link into another application you can have duplicate symbol main.
I don't think the main.c should be part of the library.
In my case I just added this line to fix it:

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -110,6 +110,8 @@ message(STATUS "Configured options: ${COMPILED_OPTIONS}")
 # GPUJPEG library
 file(GLOB H_FILES libgpujpeg/*.h ${CMAKE_CURRENT_BINARY_DIR}/libgpujpeg/gpujpeg_version.h)
 file(GLOB C_FILES src/*.c src/*/*.cpp src/*.cu)
+list(REMOVE_ITEM C_FILES "${CMAKE_CURRENT_SOURCE_DIR}/src/main.c")

Not sure I always install the cuda toolkit 11.2 and then use this one for build my project.
But its weird this problem appear suddenly.

Unfortunately I am not yet able to reproduce that in Ubuntu 20.04. So I think that it may be either because using different CUDA distribution or different configure flags - which ones do you use?

BTW another issue happen with the latest code. When building statically the gpujpeg you include the main.c as part of the static library.
Later if you link into another application you can have duplicate symbol main.
I don't think the main.c should be part of the library.
In my case I just added this line to fix it:

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -110,6 +110,8 @@ message(STATUS "Configured options: ${COMPILED_OPTIONS}")
 # GPUJPEG library
 file(GLOB H_FILES libgpujpeg/*.h ${CMAKE_CURRENT_BINARY_DIR}/libgpujpeg/gpujpeg_version.h)
 file(GLOB C_FILES src/*.c src/*/*.cpp src/*.cu)
+list(REMOVE_ITEM C_FILES "${CMAKE_CURRENT_SOURCE_DIR}/src/main.c")

Correct, thanks, I didn't notice - I'll fix in a moment.

I've tried to reproduce the issue in a blank Ubuntu 20.04 container with downloaded CUDA 11.2 (sorry for a mistake, U20.04 has cuda 10.1) with:

cmake -DBUILD_SHARED_LIBS=OFF -DCMAKE_CUDA_COMPILER=/usr/local/cuda-11.2/bin/nvcc

(needed to add CUDA to include_directories in CMakeLists.txt - already pushed).

Anyways, I'd suggest to check if there is not accidentally the distro CUDA present in the system - I don't know the exact mechanism, but I managed so creating broken libgpujpeg.so - it was not linked to libcudart but didn't contain the symbols, you could check this by:

 nm  libgpujpeg.so | grep cudaStreamSynchronize
 ldd libgpujpeg.so | grep cudart

The symbol should be either defined ('t' or 'T') or the library should be linked to cudart. Of course the above applies only for dynamic linking. But even if you intend to build statically, it may be useful to build a dynamic version to see if this is a related issue.

When I am doing nm I have Undefined for cudaStream but its linking with cudart
U cudaStreamSynchronize@@libcudart.so.11.0

This is really weird.

@MartinPulec after reverting the code to older version of my code I found the change.
My application need to link with both cudart and cuda. When I do the link cuda;cudart I have the issue if I flip its working ...

I am not fully sure if that make sense but thats a second time I rebuild all my stuff and it is working with your latest master

U cudaStreamSynchronize@@libcudart.so.11.0

Other CUDA symbols are defined? I guess that either the symbol must be defined or the cudart.so should be referenced as a dependency. How do you build the lib? When I tried on a similar setup, but it ended up linking with the CUDA static stub, so no unsatisfied dependencies in the library.