Cannot run Examples using Cuda on V100
jtchilders opened this issue · 9 comments
I am attempting to run Exampe #2 in the Intro-Full space on an NVidia V100.
I have cloned kokkos
and kokkos-tutorials
in my $HOME/Kokkos
directory.
I have GCC 8.2.0 and Cuda 10.1.243 in my LD_LIBRARY_PATH/PATH.
I go into kokkos-tutorials/Intro-Full/Exercises/02/Begin
and run:
make -j KOKKOS_DEVICES=Cuda
make -j KOKKOS_DEVICES=OpenMP
Both run without error.
The host
binary runs fine. However the when running:
dgx2 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/Begin }-> CUDA_VISIBLE_DEVICES=0 ./02_Exercise.cuda -S 26
User S is 67108864
Total size S = 67108864 N = 65536 M = 1024
Error: result( 0.000000 ) != solution( 67108864.000000 )
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory<DriverType>, (prefer_shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1)) error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:426
Traceback functionality not available
Aborted
My first assumption is that I've done something wrong, but I can't quite figure it out as I was directly following the tutorial from ATPESC last summer on youtube.
if you compare the Makefiles in Begin and Solution you will see that it requires the "force_uvm" option being added to the compiler options. The code will access the views on both host and device, and thus you need to use a memory-space which is accessible from both sides. Since we do not discuss memory spaces in that stage of the tutorial yet, we ask folks to add the force_uvm option (maybe we should just add it though …)
Thanks Christian, I should have included that I tried both builds:
make -j KOKKOS_DEVICES=Cuda
make -j KOKKOS_DEVICES=Cuda KOKKOS_CUDA_OPTIONS=force_uvm,enable_lambda
full experiment below:
parton::dgx2 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/Begin }-> make clean
rm -f Kokkos_HostBarrier.o Kokkos_ExecPolicy.o Kokkos_HostThreadTeam.o Kokkos_Spinwait.o Kokkos_HostSpace_deepcopy.o Kokkos_Profiling_Interface.o Kokkos_hwloc.o Kokkos_CPUDiscovery.o Kokkos_SharedAlloc.o Kokkos_MemoryPool.o Kokkos_Error.o Kokkos_Core.o Kokkos_Stacktrace.o Kokkos_HostSpace.o Kokkos_UnorderedMap_impl.o Kokkos_OpenMP_Task.o Kokkos_OpenMP_Exec.o KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
rm -f *.o *.cuda *.host
parton::dgx2 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/Begin }-> ls
CMakeLists.txt exercise_2_begin.cpp Makefile
parton::dgx2 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/Begin }-> make -j KOKKOS_DEVICES=Cuda KOKKOS_CUDA_OPTIONS=force_uvm,enable_lambda
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c exercise_2_begin.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostBarrier.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_ExecPolicy.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Serial.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Spinwait.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostSpace_deepcopy.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_hwloc.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_MemoryPool.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Error.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Core.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Stacktrace.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostSpace.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_Locks.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_70 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
ar cr libkokkos.a Kokkos_HostBarrier.o Kokkos_ExecPolicy.o Kokkos_HostThreadTeam.o Kokkos_Serial_Task.o Kokkos_Serial.o Kokkos_Spinwait.o Kokkos_HostSpace_deepcopy.o Kokkos_Profiling_Interface.o Kokkos_hwloc.o Kokkos_CPUDiscovery.o Kokkos_SharedAlloc.o Kokkos_MemoryPool.o Kokkos_Error.o Kokkos_Core.o Kokkos_Stacktrace.o Kokkos_HostSpace.o Kokkos_UnorderedMap_impl.o Kokkos_Cuda_Instance.o Kokkos_Cuda_Locks.o Kokkos_CudaSpace.o Kokkos_Cuda_Task.o
ranlib libkokkos.a
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -arch=sm_70 -L/home/parton/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/Begin -L/soft/compilers/cuda/cuda-10.1.243/lib64 exercise_2_begin.o -lkokkos -ldl -lcudart -lcuda -o "02_Exercise".cuda
echo "Start Build"
Start Build
parton::dgx2 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/Begin }-> CUDA_VISIBLE_DEVICES=0 ./02_Exercise.cuda -S 26
User S is 67108864
Total size S = 67108864 N = 65536 M = 1024
Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default
without setting CUDA_LAUNCH_BLOCKING=1.
The code must call Cuda().fence() after each kernel
or will likely crash when accessing data on the host.
Error: result( 0.000000 ) != solution( 67108864.000000 )
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory<DriverType>, (prefer_shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1)) error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:426
Traceback functionality not available
Aborted
I tried to just run the solution out of the box on the Cooley (K80) cluster at Argonne and got a similar result:
parton::cc012 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/01/Solution }-> make clean
rm -f Kokkos_HostBarrier.o Kokkos_ExecPolicy.o Kokkos_HostThreadTeam.o Kokkos_Spinwait.o Kokkos_HostSpace_deepcopy.o Kokkos_Profiling_Interface.o Kokkos_hwloc.o Kokkos_CPUDiscovery.o Kokkos_SharedAlloc.o Kokkos_MemoryPool.o Kokkos_Error.o Kokkos_Core.o Kokkos_Stacktrace.o Kokkos_HostSpace.o Kokkos_UnorderedMap_impl.o Kokkos_OpenMP_Task.o Kokkos_OpenMP_Exec.o KokkosCore_config.h KokkosCore_config.tmp libkokkos.a
rm -f *.o *.cuda *.host
parton::cc012 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/01/Solution }-> ls
CMakeLists.txt exercise_1_solution.cpp Makefile
parton::cc012 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/01/Solution }-> nvidia-smi -L
GPU 0: Tesla K80 (UUID: GPU-74302287-3491-efac-9ad4-6df0d8ea4e65)
GPU 1: Tesla K80 (UUID: GPU-99571b73-d5fc-670d-d3fd-fd2b768046df)
parton::cc012 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/01/Solution }-> make -j KOKKOS_DEVICES=Cuda KOKKOS_CUDA_OPTIONS=force_uvm,enable_lambda KOKKOS_ARCH=Kepler37
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c exercise_1_solution.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostBarrier.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_ExecPolicy.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostThreadTeam.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Serial_Task.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Serial.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Spinwait.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostSpace_deepcopy.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Profiling_Interface.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_hwloc.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_CPUDiscovery.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_SharedAlloc.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_MemoryPool.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Error.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Core.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_Stacktrace.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/impl/Kokkos_HostSpace.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/containers/src/impl/Kokkos_UnorderedMap_impl.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_Locks.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti --std=c++11 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -expt-extended-lambda -arch=sm_37 -I./ -I/home/parton/Kokkos/kokkos/core/src -I/home/parton/Kokkos/kokkos/containers/src -I/home/parton/Kokkos/kokkos/algorithms/src -I/home/parton/Kokkos/kokkos/core/src/eti -O3 -c /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_Task.cpp
ar cr libkokkos.a Kokkos_HostBarrier.o Kokkos_ExecPolicy.o Kokkos_HostThreadTeam.o Kokkos_Serial_Task.o Kokkos_Serial.o Kokkos_Spinwait.o Kokkos_HostSpace_deepcopy.o Kokkos_Profiling_Interface.o Kokkos_hwloc.o Kokkos_CPUDiscovery.o Kokkos_SharedAlloc.o Kokkos_MemoryPool.o Kokkos_Error.o Kokkos_Core.o Kokkos_Stacktrace.o Kokkos_HostSpace.o Kokkos_UnorderedMap_impl.o Kokkos_Cuda_Instance.o Kokkos_Cuda_Locks.o Kokkos_CudaSpace.o Kokkos_Cuda_Task.o
ranlib libkokkos.a
/home/parton/Kokkos/kokkos/bin/nvcc_wrapper -arch=sm_37 -L/gpfs/mira-home/parton/Kokkos/kokkos-tutorials/Intro-Full/Exercises/01/Solution -L/soft/visualization/cuda-10.1/lib64 exercise_1_solution.o -lkokkos -ldl -lcudart -lcuda -o "01_Exercise".cuda
echo "Start Build"
Start Build
parton::cc012 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/01/Solution }-> CUDA_LAUNCH_BLOCKING=1 CUDA_VISIBLE_DEVICES=0 ./01_Exercise.cuda -S 26
User S is 67108864
Total size S = 67108864 N = 65536 M = 1024
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaFuncGetAttributes( &attr, cuda_parallel_launch_local_memory<DriverType>) error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/parton/Kokkos/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:448
Traceback functionality not available
Aborted
I attempted to use CMake as well, following these steps and got an architecture mismatch error:
parton::gpu01 { ~/Kokkos/kokkos_cmake_build }-> cmake ../kokkos -DCMAKE_CXX_COMPILER=$HOME/Kokkos/kokkos/bin/nvcc_wrapper \
-DCMAKE_C_COMPILER=$HOME/Kokkos/kokkos/bin/nvcc_wrapper \
-DKokkos_ENABLE_CUDA=ON -DKokkos_ENABLE_OPENMP=ON \
-DKokkos_ARCH_VOLTA72=On \
-DCMAKE_INSTALL_PREFIX=$HOME/Kokkos/install \
-DKokkos_ENABLE_CUDA_LAMBDA=On
parton::gpu01 { ~/Kokkos/kokkos_cmake_build }-> make -j install
parton::gpu01 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/sbuild }-> cmake ../Solution \
-DCMAKE_PREFIX_PATH=$HOME/Kokkos/install/lib64/cmake \
-DCMAKE_CXX_COMPILER=$HOME/Kokkos/kokkos/bin/nvcc_wrapper \
-DCMAKE_C_COMPILER=$(which gcc) -DKokkos_ENABLE_CUDA=ON \
-DKokkos_ENABLE_OPENMP=ON -DKokkos_ARCH_VOLTA72=On \
-DCMAKE_INSTALL_PREFIX=$HOME/Kokkos/install \
-DKokkos_ENABLE_CUDA_LAMBDA=On
parton::gpu01 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/sbuild }-> make -j
Scanning dependencies of target 02_Exercise
[ 50%] Building CXX object CMakeFiles/02_Exercise.dir/exercise_2_solution.cpp.o
[100%] Linking CXX executable 02_Exercise
[100%] Built target 02_Exercise
parton::gpu01 { ~/Kokkos/kokkos-tutorials/Intro-Full/Exercises/02/sbuild }-> ./02_Exercise
Total size S = 4194304 N = 4096 M = 1024
Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set
In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads
For best performance with OpenMP 3.1 set OMP_PROC_BIND=true
For unit testing set OMP_PROC_BIND=false
Kokkos::Cuda::initialize ERROR: likely mismatch of architecture
Aborted
Sorry hit the wrong button and closed the issue prematurely.
Ok for the cmake you need Kokkos_ENABLE_CUDA_UVM=On also if you run on a V100 it is Kokkos_ARCH_VOLTA70 not 72 that's why you get the mismatch. Exercise 1 never runs on GPUs, since it doesn't fix the allocations. And as I said the Begin Makefile of exercise 02 needs the force_uvm added. Exercises 3 onwards should run out of the box on the GPU both Begin and Solution. Exercise 02 if you go to the Solution directory and you type make "KOKKOS_DEVICES=Cuda" it should run on X86 with V100 out of the box. On Power systems like Summit you need to add KOKKOS_ARCH=Volta70,Power9 potentially.
ok, thank you, Christian. The change from 72 to 70 was the key. I had found the Kokkos_ENABLE_CUDA_UVM=On after my previous post. I have exercise 03 working.
So for completeness, my cmake command was the following for a V100 + Xeon Gold 6152
cmake ../kokkos -DCMAKE_CXX_COMPILER=$HOME/Kokkos/kokkos/bin/nvcc_wrapper
-DKokkos_ENABLE_CUDA=ON
-DKokkos_ENABLE_OPENMP=ON
-DKokkos_ARCH_VOLTA70=On
-DCMAKE_INSTALL_PREFIX=$HOME/Kokkos/install
-DKokkos_ENABLE_CUDA_LAMBDA=On
-DKokkos_ENABLE_CUDA_UVM=On
On Power systems like Summit you need to add KOKKOS_ARCH=Volta70,Power8 potentially.
For future reference: Summit
is POWER9
ah typo … editing the post.