LAMMPS-GPU Benchmark-Cuda driver error 4 in call at file ‘geryon/nvd_device.h
DaveiV opened this issue · 4 comments
Summary
Got error about Cuda driver
mpirun --allow-run-as-root -n 32 lmp -sf gpu -pk gpu 2 -restart2data lmp.restart remap lmp_final.data
LAMMPS Version and Platform
LAMMPS version 20230802.1 and 20230802.2
Details
Noted:
- I got this error when run LAMMPS version 20230802.1 and 20230802.2 with cuda
- But when run LAMMPS version 20230802.1 and 20230802.2 with opencl then no error appears , The program runs smoothly but performance will be decrease by about 20%
- When I run LAMMPS version 20220623.4 with cuda the error not appears
Compare LAMMPS version 20230802 is different from LAMMPS version 20220602 in file lib/gpu/geryon/nvd_device.h. Some function added in this file
-Version 20230802
void UCL_Device::clear() {
if (_device > -1) {
for (int i = 1; i < num_queues(); i++)
pop_command_queue();
#if GERYON_NVD_PRIMARY_CONTEXT
CU_SAFE_CALL_NS(cuCtxSetCurrent(_old_context));
CU_SAFE_CALL_NS(cuDevicePrimaryCtxRelease(_cu_device));
#else
cuCtxDestroy(_context);
#endif
_device = -1;
}
-Version 20220623
void UCL_Device::clear() {
if (_device > -1) {
for (int i = 1; i < num_queues(); i++)
pop_command_queue();
cuCtxDestroy(_context);
_device = -1;
}
Currently I' m using Spack to build LAMMPS
spack graph lammps@20230802.1%aocc@4.1.0+cuda cuda_arch=90 fftw_precision=single target=zen4 +extra-dump +granular +kspace +manybody +meam +molecule +opt +replica +rigid +openmp +openmp-package ^amdfftw %aocc@4.1.0 ^ucx@1.15.0 %aocc@4.1.0 +xpmem+verbs+ud+rc+mlx5_dv+cuda cuda_arch=80 ^openmpi@4.1.5 %aocc@4.1.0 +cuda cuda_arch=80 fabrics=ucx
mpirun --allow-run-as-root -n 32 lmp -sf gpu -pk gpu 2 -restart2data lmp.restart remap lmp_final.data
There is no GPU package acceleration used for this command. So then only difference you are measuring in term of time is the difference between one time initialization of the GPU.
In fact, there is little benefit from using MPI parallelization at all (and there is little benefit in general for using 16 MPI processes per GPU either unless you compile and run with CUDA multiprocessor server support via -DCUDA_MPS_SUPPORT
).
So please test with the in.lj
or in.rhodo
or in.eam
examples in the bench folder and use only 2-8 MPI processes.
Dear @akohlmey .
About command
mpirun --allow-run-as-root -n 32 lmp -sf gpu -pk gpu 2 -restart2data lmp.restart remap lmp_final.data
May I confirm you mean this command not suppport GPU package acceleration . It only works with CPU, right?
And why, when I run this command with LAMMPS version 20220623, there is no error, but when I run it with version 20230802, I encounter this error?.
About LAMMPS with MPI parallelization , I will try -DCUDA_MPS_SUPPORT
after fix error above
I have tested and successfully run the in.lj, in.rhodo, and in.eam files.
Thank you
May I confirm you mean this command not suppport GPU package acceleration .
It does not use GPU acceleration. The -restart2data
command line flag as used in your example is equivalent to an input file with:
read_restart lmp.restart remap
write_data lmp_final.data noinit
It only works with CPU, right?
It sets up a calculation with GPU acceleration, but then does not properly initialize everything and never uses the GPU. The operations in use are heavily I/O bound and thus not parallelizable. Enabling the GPU for this leads to undefined behavior. Now, LAMMPS might be changed to handle this case more gracefully, but first and foremost, this is a user error or trying to enable functionality that (very obviously) has no meaning in the application of it.
One more comment: your use of the flag --allow-run-as-root
suggests that you are running as root. This is a very, very, VERY bad idea. Running MPI as root makes no sense since parallel applications are not a system management thing and in combination with LAMMPS it is particularly dangerous since LAMMPS has facilities that may delete or modify files: one typo and you may destroy your entire installation and render the server unusable to the point of requiring an installation from scratch.