Considerations on compiling CUDA programs without a capable device (GPU). Short descriptions of findings gained while working on parallization of a Java application using Parallel Java 2 Library (PJ2) and CUDA. Work was carried out on a Windows box with no CUDA device. A CUDA capable AWS G2 instance was used for testing.
To compile CUDA programs one need a CUDA capable device (GPU), the device driver and the CUDA Toolkit with each dependent on it's predecessor. To compile programs that load kernels on CUDA devices Visual Studio needs to be in place as well. Take a look at NVIDIA's documentation to get an in-depth understanding of developing CUDA programs. This section explores how far to come without a CUDA device.
Download and install Cygwin (x86_64) with GCC compiler suite. Consider a full install to avoid problems due to missing packages. Download CUDA Toolkit and extract content (e.g. using UniExtract).
...run commands in box below. This will compile one of the the CUDA samples that don't load a kernel. Resulting program should work on any CUDA capable machine. Compiler option -static
assures that any GCC runtime libraries are inside the binary. Library referenced by options -L$CUDA_HOME/lib/x64 -lcudart
is the NVIDIA CUDA Runtime API and a wrapper for cudart64_75.dll
as seen in output of nm $CUDA_HOME/lib/x64/cudart.lib
. The directory containing the DLL needs to be part of PATH
. Check dynamic references with ldd
.
# CUDA Toolkit 7.5
export CUDA_HOME=/usr/lab/cudacons/cuda_7.5.18_windows/CUDAToolkit
export PATH=$CUDA_HOME/bin:$PATH
# CUDA Toolkit 8
export CUDA_HOME=/usr/lab/cudacons/cuda_8.0.44_windows/compiler
export PATH=$CUDA_HOME/../cudart/bin:$PATH
x86_64-w64-mingw32-c++ \
-I$CUDA_HOME/include \
-I$CUDA_HOME/../CUDASamples/common/inc \
-o deviceQuery.exe $CUDA_HOME/../CUDASamples/1_Utilities/deviceQuery/deviceQuery.cpp \
-static -L$CUDA_HOME/lib/x64 -lcudart
# CUDA Toolkit 11.1
export CUDA_HOME=/usr/lab/cudacons/cuda_11.1.1_456.81_win10/cuda_nvcc/nvcc
export PATH=$CUDA_HOME/bin:$PATH
x86_64-w64-mingw32-c++ \
-I$CUDA_HOME/include \
-I$CUDA_HOME/../../cuda_cudart/cudart/include \
-I$CUDA_HOME/../../cuda_Samples/cuda_Samples/common/inc \
-o deviceQuery.exe $CUDA_HOME/../../cuda_Samples/cuda_Samples/1_Utilities/deviceQuery/deviceQuery.cpp \
-static -L$CUDA_HOME/../../cuda_cudart/cudart/lib/x64 -lcudart
ldd ./deviceQuery.exe
...run commands in box below (again no CUDA kernels). The example is from the PJ2 distribution which is solely for Linux. There is a shared object libEduRitGpuCuda.so
in PJ2 that connects the Java side with CUDA. The illustrated example migrates the .so
to a .dll
. The library referenced by options -L$CUDA_HOME/lib/x64 -lcuda
is the NVIDIA CUDA Driver API and a wrapper for nvcuda.dll
(see output of nm $CUDA_HOME/lib/x64/cuda.lib
). The DLL is part of the NVIDIA driver package. It's name is nvcuda64.dl_
. Expand file to nvcuda.dll
and make sure that dynamic linker can find it. Run ldd
to check references.
export JAVA_HOME=/cygdrive/c/program\ files/java/jdk1.8.0_151
export PJ2AWS_HOME=/usr/src/pj2aws
export PATH=$CUDA_HOME/../Display.Driver:$PATH
rm -f $CUDA_HOME/../Display.Driver/nvcuda.dll
/cygdrive/c/windows/system32/expand \
`cygpath -w $CUDA_HOME/../Display.Driver/nvcuda64.dl_` \
`cygpath -w $CUDA_HOME/../Display.Driver/nvcuda.dll`
# CUDA Toolkit 8
rm -f $CUDA_HOME/../Display.Driver/nvfatbinaryloader.dll
/cygdrive/c/windows/system32/expand \
`cygpath -w $CUDA_HOME/../Display.Driver/nvfatbinaryloader64.dl_` \
`cygpath -w $CUDA_HOME/../Display.Driver/nvfatbinaryloader.dll`
x86_64-w64-mingw32-gcc -Wall -shared \
-I$PJ2AWS_HOME/pj2/lib \
-I"$JAVA_HOME/include" \
-I"$JAVA_HOME/include/win32" \
-I$CUDA_HOME/include \
-o EduRitGpuCuda.dll $PJ2AWS_HOME/pj2/lib/edu_rit_gpu_Cuda.c \
-L$CUDA_HOME/lib/x64 -lcuda
# CUDA Toolkit 11.1
x86_64-w64-mingw32-gcc -Wall -shared \
-I$PJ2AWS_HOME/pj2/lib \
-I"$JAVA_HOME/include" \
-I"$JAVA_HOME/include/win32" \
-I$CUDA_HOME/include \
-I$CUDA_HOME/../../cuda_cudart/cudart/include \
-o EduRitGpuCuda.dll $PJ2AWS_HOME/pj2/lib/edu_rit_gpu_Cuda.c \
-L$CUDA_HOME/../../cuda_cudart/cudart/lib/x64 -lcuda
ldd EduRitGpuCuda.dll
Clue(s): (a) Check PATH
if ldd
won't show expected DLL's. (b) If ldd
reports "???" instead of names run strings <path> | grep -i dll
on each DLL file search output for DLLs and check if they exist and can be found. Continue recursively until "???" are gone.
...install Visual Studio Community 2015. Select C++ programming language in installation wizard. Start a Developer Command Prompt to get a propper VS environment and run NVCC on a .cu
file. CUDA Toolkit 11 needs Visual Studio Community 2017 or above and X86 Native or Cross Tools Command Prompt. The Developer Command Prompt as used for former Toolkits makes NVCC spit out lots of compilation errors.
rem turn off file name completion due to TABs in commands below.
cmd /f:off
rem CUDA Toolkit 8
set CUDA_HOME=%userprofile%\lab\cudacons\cuda_8.0.44_windows\compiler
set PATH=%CUDA_HOME%\bin;%PATH%
cd %userprofile%\lab\cudacons
nvcc -Wno-deprecated-gpu-targets -ptx ^
-I%CUDA_HOME%\..\CUDASamples\common\inc ^
-o clock.ptx %CUDA_HOME%\..\CUDASamples\0_Simple\clock\clock.cu
rem CUDA Toolkit 11.1
set CUDA_HOME=%userprofile%\lab\cudacons\cuda_11.1.1_456.81_win10\cuda_nvcc\nvcc
set PATH=%CUDA_HOME%\bin;%PATH%
cd %userprofile%\lab\cudacons
nvcc -Wno-deprecated-gpu-targets -ptx ^
-I%CUDA_HOME%\..\..\cuda_cudart\cudart\include ^
-I%CUDA_HOME%\..\..\cuda_Samples\cuda_Samples\common\inc ^
-o clock.ptx %CUDA_HOME%\..\..\cuda_Samples\cuda_Samples\0_Simple\clock\clock.cu
Maybe LLVM can do the trick as well...
Almost the same as on Windows but without Visual Studio of course. Make sure GCC compiler suite and develoment tools are available. If on Red Hat Linux consider installation of Develoment Tools. Download runfile of CUDA toolkit and extract content.
# Install GNU C development tools
sudo yum groupinstall "Development Tools"
# Download CUDA runfile. Adjust URL as appropriate.
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
# Unpack CUDA runfile
sh cuda_8.0.61_375.26_linux-run --extract=`pwd`
# Unpack CUDA driver in CWD
sh NVIDIA-Linux-x86_64-375.26.run --extract-only
# Install CUDA toolkit and samples
sudo sh cuda-linux64-rel-8.0.61-21551265.run
sudo sh cuda-samples-linux-8.0.61-21551265.run
...run commands in box below. It should build one of the CUDA samples and run on any CUDA machine. Use ldd
to check dynamic references.
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
g++ -I$CUDA_HOME/include \
-I/usr/local/cuda-8.0/samples/common/inc \
-o deviceQuery /usr/local/cuda-8.0/samples/1_Utilities/deviceQuery/deviceQuery.cpp \
-L$CUDA_HOME/lib64 -lcudart
ldd ./deviceQuery
...run commands in box below. As described above in the section on Windows this will build the native library from PJ2.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PJ2AWS_HOME=../pj2aws
export LD_LIBRARY_PATH=$PJ2AWS_HOME/pj2/lib:NVIDIA-Linux-x86_64-375.26:$LD_LIBRARY_PATH
( cd NVIDIA-Linux-x86_64-375.26 ; ln -s libcuda.so.375.26 libcuda.so ; ln -s libcuda.so.375.26 libcuda.so.1 )
gcc -Wall -shared -fPIC \
-I$PJ2AWS_HOME/pj2/lib \
-I$JAVA_HOME/include \
-I$JAVA_HOME/include/linux \
-I$CUDA_HOME/include \
-o libEduRitGpuCuda.so $PJ2AWS_HOME/pj2/lib/edu_rit_gpu_Cuda.c \
-LNVIDIA-Linux-x86_64-375.26 -lcuda
ldd libEduRitGpuCuda.so
...add NVCC to PATH
and run commands in box below.
export PATH=$CUDA_HOME/bin:$PATH
nvcc -Wno-deprecated-gpu-targets -ptx \
-I/usr/local/cuda-8.0/samples/common/inc \
-o clock.ptx /usr/local/cuda-8.0/samples/0_Simple/clock/clock.cu