GPU configuration: The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck
trapprb8 opened this issue · 10 comments
When I start a simulation in gpu mode I get the following error message:
Error in setConst_hprime_xx: invalid device symbol
The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck
I am trying to configure with a Quadro P4000, which should be Pascal architecture, and therefore cuda8 should be used in configuration I guess (according the overview in the makefile, see below)?
I used the following code:
$ ./configure FC=gfortran CC=gcc --with-mpi MPIFC=mpif90 USE_BUNDLED_SCOTCH=1 --with-cuda=cuda8 CUDA_LIB=/usr/local/cuda/lib64
$ make
Overview in makefile:
# CUDA architecture / code version
# Fermi (not supported): -gencode=arch=compute_10,code=sm_10
# Tesla (Tesla C2050, GeForce GTX 480): -gencode=arch=compute_20,code=sm_20
# Tesla (cuda4, K10, Geforce GTX 650, GT 650m): -gencode=arch=compute_30,code=sm_30
# Kepler (cuda5, K20) : -gencode=arch=compute_35,code=sm_35
# Kepler (cuda6.5, K80): -gencode=arch=compute_37,code=sm_37
# Maxwell (cuda6.5+/cuda7, Quadro K2200): -gencode=arch=compute_50,code=sm_50
# Pascal (cuda8,P100, GeForce GTX 1080, Titan): -gencode=arch=compute_60,code=sm_60
# Volta (cuda9, V100): -gencode=arch=compute_70,code=sm_70
# Turing (cuda10, T4, GeForce RTX 2080): -gencode=arch=compute_75,code=sm_75
# Ampere (cuda11, A100, GeForce RTX 3080): -gencode=arch=compute_80,code=sm_80
# Hopper (cuda12, H100): -gencode=arch=compute_90,code=sm_90
the Quadro P4000 has CUDA compute capability 6.1. that means you will likely have to modify the Makefile a bit after configuration and instead of
-gencode=arch=compute_60,code=sm_60
use:
-gencode=arch=compute_61,code=sm_61
Thank you for your answer! :)
Unfortunately, that didn't work yet, the error stays the same.
What I did now was:
in Makefile.in:
GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"
in Makefile:
GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"
GENCODE = $(GENCODE_60) $(FC_DEFINE)GPU_DEVICE_Pascal #this line stays same, just wanted to show for completion
and run
$ ./configure FC=gfortran CC=gcc --with-mpi MPIFC=mpif90 USE_BUNDLED_SCOTCH=1 --with-cuda=cuda8 CUDA_LIB=/usr/local/cuda/lib64
$ make
great, thanks for the quick feedback!
note that the Makefile
gets created by running the ./configure
script. so, you would only need to either modify the Makefile.in
before running the configuration, of the Makefile
after running the configuration.
Hi Daniel, thanks again! :)
I also did this, however it does not work. Still the same error.
We are only talking about the Makefile.in and Makefile in the main directory, right?
I uploaded the two files:
yes, the GPU architecture is specified only in the main Makefiles in the root directory, Makefile.in
and the generated one Makefile
.
can you be more specific what did not work, the compilation even with the modifications as you suggested, or the modification of only one of the Makefiles? that is, do you still get the error
Error in setConst_hprime_xx: invalid device symbol
even with the modification
GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"
in these Makefiles? if so, then what are your CUDA toolkit and CUDA driver versions?
Exactly, the error is the same as before:
Error in setConst_hprime_xx: invalid device symbol The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck
The Cuda version is 11.8, nvcc --version gives me:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
could you also add the output of the command nvidia-smi
to see the driver version on your system?
This output is:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P4000 On | 00000000:05:00.0 On | N/A |
| 46% 30C P0 28W / 105W | 240MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2980 G /usr/lib/xorg/Xorg 192MiB |
| 0 N/A N/A 3511 G cinnamon 30MiB |
| 0 N/A N/A 4673 G /usr/lib/firefox/firefox 13MiB |
+-----------------------------------------------------------------------------+
tricky... according to the toolkit documentation, that driver version looks okay for CUDA 11.8 and it should support the compute capability 6.1. unfortunately, I can't reproduce it as I don't have access to such a GPU card. the code works however on most older and newer cards, so I would expect this to be a driver version and CUDA toolkit issue.
to double check the compute capability of your card, could you compile and run the little helper tool in utils/GPU_tools/
folder on your system:
cd ~/<specfem-directory>/utils/GPU_tools/
nvcc --gpu-architecture=sm_60 -o check_cuda_device check_cuda_device.cu
./check_cuda_device
the tool will provide an info output with the compute capability listed.
in the past CIG-seismo forum somebody was able to run the code on a Quadro P6000, I think with a CUDA 9.1 version. you could try to downgrade CUDA driver & runtime version to see if this solves the issue.
Hi dear,
here is the output of the helper tool:
``
found number of CUDA devices = 1
GPU device id: 0
Device Name = Quadro P4000
memory:
totalGlobalMem (in MB, dividing by powers of 1024): 8116.562500
totalGlobalMem (in GB, dividing by powers of 1024): 7.926331
totalGlobalMem (in MB, dividing by powers of 1000): 8510.833008
totalGlobalMem (in GB, dividing by powers of 1000): 8.510833
sharedMemPerBlock (in bytes): 49152
blocks:
Maximum number of registers per block: 65536
Maximum number of threads per block: 1024
Maximum size of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
features:
Compute capability of the device = 6.1
multiProcessorCount: 14
canMapHostMemory: TRUE
deviceOverlap: TRUE
0: GPU memory usage (dividing by powers of 1024): used = 319.625000 MB, free = 7796.937500 MB, total = 8116.562500 MB
0: GPU memory usage (dividing by powers of 1000): used = 335.151104 MB, free = 8175.681536 MB, total = 8510.832640 MB
number of total devices: 1
``
Ok.. Maybe I will try to downgrade the Cuda Toolkit then!