vetter/shoc

Make with CUDA support fails

bald34 opened this issue · 4 comments

Hello, Dakar team guys.

I'm trying to build SHOC 1.1.5 with CUDA and MPI support under CentOS 6.5 x64. Our PC has CUDA SDK 6.5, OpenMPI 1.8.1, Intel C++ compiler and Intel MKL 11.1 installed.

Configure command "./configure CPPFLAGS="-I/usr/local/cuda/include" --with-cuda --with-mpi" ends fine. Its output contains following lines:
configure: checking for usable OpenCL opencl.h header
checking OpenCL/opencl.h usability... yes
checking OpenCL/opencl.h presence... yes
checking for OpenCL/opencl.h... yes
checking for usable OpenCL library... -lOpenCL
checking for nvcc... /usr/local/cuda/bin/nvcc
checking cuda.h usability... yes
checking cuda.h presence... yes
checking for cuda.h... yes
checking cuda_runtime.h usability... yes
checking cuda_runtime.h presence... yes
checking for cuda_runtime.h... yes
checking for cublasInit in -lcublas... yes
checking for cufftPlan1d in -lcufft... yes
checking for mpicxx... /usr/local/mpi/bin/mpicxx
checking whether we can compile an MPI program using /usr/local/mpi/bin/mpicxx... yes
checking whether we can link an MPI program using /usr/local/mpi/bin/mpicxx... yes

So, i decided, CUDA, OpenCL and MPI were successfully found. But make command fails:
/usr/local/mpi/bin/mpicxx -g -O2 -L../../../../src/cuda/common -L../../../../src/common -o BusSpeedDownload main.o BusSpeedDownload.o -lSHOCCommon "/tmp/tmpxft_00007322_00000000-16_bogus.o" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib" -lcudadevrt -lcudart_static -lrt -lpthread -ldl -lrt -lrt
icpc: error #10236: File not found: '/tmp/tmpxft_00007322_00000000-16_bogus.o'

PATH and LD_LIBRARY_PATH variables set to proper values. Other CUDA-capable applications like CUDA-accelerated HPL works fine. Could you help me, where is the problem in?
Thank you.

We are attempting to reproduce the problem on a local system, but so far have not been able to. The error message points to a problem with the find_cuda_libs.sh script. In your case, it appears that the script didn't parse the output of nvcc -dryrun correctly, leaving the bogus.o path in the library spec.

Would you please add a comment containing the output of running the config/find_cuda_libs.sh script on your system? It takes one argument - the path to the nvcc that you are using.

This problem may have been fixed with a recent change to find_cuda_libs.sh (commit b3ebb6f, from Friday August 29). We encourage the original poster to be sure his or her local clone is up-to-date with the repository. If so, and the problem persists, please post the output of nvcc --dryrun as described in the earlier comment.

Hi, rothpc.

You were right, last commit fixes this bug. Yesterday I really had previous build because it was downloaded some days ago. I'm sorry for this small inadvertence.

Here below are outputs of old and new build config/find_cuda_libs.sh that you asked for. They are different :
[bald@node8 config]$ which nvcc
/usr/local/cuda/bin/nvcc
[bald@node8 config]$ ./find_cuda_libs.sh /usr/local/cuda/bin/nvcc
"/tmp/tmpxft_00003d15_00000000-16_bogus.o" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib" -lcudadevrt -lcudart_static -lrt -lpthread -ldl

[bald@node8 config]$ ./find_cuda_libs.sh /usr/local/cuda/bin/nvcc
"-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib" -lcudadevrt -lcudart_static -lrt -lpthread -ldl

Thank you for so soon reply. Also i have one more question about MaxFlops' performance on multiple GPUs. It doesn't change whether i choose one GPU or i choose more than one GPU. If you decide i should open new ticket (and close this one), i'll do it.

Bug was fixed