CUDA version report error on certain inputs

Question

CUDA version report error on certain inputs

lingda-li opened this issue 3 years ago · 9 comments

Hi,

I compiled the CUDA version of IAMR examples, but found they will report errors on certain inputs. For example, inIAMR/Exec/eb_run2d:
"./amr2d.gnu.MPI.CUDA.ex inputs.2d.double_shear_layer-rotate" runs correctly.
"./amr2d.gnu.MPI.CUDA.ex inputs.2d.flow_past_cylinder-x" reports the following errors:
No protocol specified
Initializing CUDA...
CUDA initialized with 1 GPU per MPI rank; 1 GPU(s) used in total
MPI initialized with 1 MPI processes
MPI initialized with thread support level 0
AMReX (21.12-dirty) initialized
xlo set to mass inflow.
xhi set to pressure outflow.
Warning: both amr.plot_int and amr.plot_per are > 0.!
NavierStokesBase::init_additional_state_types()::have_divu = 0
NavierStokesBase::init_additional_state_types()::have_dsdt = 0
NavierStokesBase::init_additional_state_types: num_state_type = 3
Initializing EB2 structs
Creating projector
Installing projector level 0
amrex::Abort::0::GPU last error detected in file ../../../amrex/Src/Base/AMReX_GpuLaunchFunctsG.H line 834: invalid device function !!!
SIGABRT
^[[ASee Backtrace.0 file for details

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 6.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

Backtrace.0 is as follows:
=== If no file names and line numbers are shown below, one can run
addr2line -Cpfie my_exefile my_line_address
to convert my_line_address (e.g., 0x4a6b) into file name and line number.
Or one can use amrex/Tools/Backtrace/parse_bt.py.

=== Please note that the line number reported by addr2line may not be accurate.
One can use
readelf -wl my_exefile | grep my_line_address'
to find out the offset for that line.

0: ./amr2d.gnu.MPI.CUDA.ex(+0x2f20b5) [0x561c797640b5]
amrex::BLBackTrace::print_backtrace_info(_IO_FILE*) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_BLBackTrace.cpp:179

1: ./amr2d.gnu.MPI.CUDA.ex(+0x2f3e35) [0x561c79765e35]
amrex::BLBackTrace::handler(int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_BLBackTrace.cpp:85

2: ./amr2d.gnu.MPI.CUDA.ex(+0x62265) [0x561c794d4265]
std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_is_local() const at /usr/include/c++/9/bits/basic_string.h:222
(inlined by) std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_dispose() at /usr/include/c++/9/bits/basic_string.h:231
(inlined by) std::__cxx11::basic_string<char, std::char_traits, std::allocator >::~basic_string() at /usr/include/c++/9/bits/basic_string.h:658
(inlined by) amrex::Gpu::ErrorCheck(char const*, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_GpuError.H:54

3: ./amr2d.gnu.MPI.CUDA.ex(+0x7e41c) [0x561c794f041c]
amrex::Gpu::AsyncArray<amrex::Box, 0>::~AsyncArray() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_GpuAsyncArray.H:64
(inlined by) void amrex::GpuBndryFuncFab::ccfcdoitamrex::FilccCell(amrex::Box const&, amrex::FArrayBox&, int, int, amrex::Geometry const&, double, amrex::Vector<amrex::BCRec, std::allocatoramrex::BCRec > const&, int, int, amrex::FilccCell&&) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_PhysBCFunct.H:393

4: ./amr2d.gnu.MPI.CUDA.ex(+0x71dc5) [0x561c794e3dc5]
amrex::GpuBndryFuncFab::operator()(amrex::Box const&, amrex::FArrayBox&, int, int, amrex::Geometry const&, double, amrex::Vector<amrex::BCRec, std::allocatoramrex::BCRec > const&, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_PhysBCFunct.H:204
(inlined by) dummy_fill(amrex::Box const&, amrex::FArrayBox&, int, int, amrex::Geometry const&, double, amrex::Vector<amrex::BCRec, std::allocatoramrex::BCRec > const&, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/NS_bcfill.H:272

5: ./amr2d.gnu.MPI.CUDA.ex(+0x3b94fa) [0x561c7982b4fa]
amrex::StateData::FillBoundary(amrex::Box const&, amrex::FArrayBox&, double, amrex::Geometry const&, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_StateData.cpp:556

6: ./amr2d.gnu.MPI.CUDA.ex(+0x3bb61d) [0x561c7982d61d]
amrex::StateDataPhysBCFunct::operator()(amrex::MultiFab&, int, int, amrex::IntVect const&, double, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_StateData.cpp:909

7: ./amr2d.gnu.MPI.CUDA.ex(+0x3b2795) [0x561c79824795]
std::enable_if<amrex::IsFabArray<amrex::MultiFab, void>::value, void>::type amrex::FillPatchSingleLevel<amrex::MultiFab, amrex::StateDataPhysBCFunct>(amrex::MultiFab&, amrex::IntVect const&, double, amrex::Vector<amrex::MultiFab*, std::allocatoramrex::MultiFab* > const&, amrex::Vector<double, std::allocator > const&, int, int, int, amrex::Geometry const&, amrex::StateDataPhysBCFunct&, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/AmrCore/AMReX_FillPatchUtil_I.H:159

8: ./amr2d.gnu.MPI.CUDA.ex(+0x3a9c82) [0x561c7981bc82]
std::vector<double, std::allocator >::~vector() at /usr/include/c++/9/bits/stl_vector.h:677
(inlined by) amrex::Vector<double, std::allocator >::~Vector() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_Vector.H:25
(inlined by) amrex::FillPatchIterator::FillFromLevel0(double, int, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_AmrLevel.cpp:1102

9: ./amr2d.gnu.MPI.CUDA.ex(+0x3aa29d) [0x561c7981c29d]
amrex::FillPatchIterator::Initialize(int, double, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_AmrLevel.cpp:1016

10: ./amr2d.gnu.MPI.CUDA.ex(+0x3ab441) [0x561c7981d441]
amrex::AmrLevel::FillPatch(amrex::AmrLevel&, amrex::MultiFab&, int, double, int, int, int, int) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_AmrLevel.cpp:2113

11: ./amr2d.gnu.MPI.CUDA.ex(+0xc69af) [0x561c795389af]
NavierStokesBase::computeGradP(double) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/NavierStokesBase.cpp:4291

12: ./amr2d.gnu.MPI.CUDA.ex(+0x840dd) [0x561c794f60dd]
NavierStokes::initData() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/NavierStokes.cpp:371

13: ./amr2d.gnu.MPI.CUDA.ex(+0x390a41) [0x561c79802a41]
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count() at /usr/include/c++/9/bits/shared_ptr_base.h:729
(inlined by) std::__shared_ptr<amrex::BoxList, (__gnu_cxx::_Lock_policy)2>::__shared_ptr() at /usr/include/c++/9/bits/shared_ptr_base.h:1169
(inlined by) std::shared_ptramrex::BoxList::~shared_ptr() at /usr/include/c++/9/bits/shared_ptr.h:103
(inlined by) amrex::BoxArray::~BoxArray() at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Base/AMReX_BoxArray.H:556
(inlined by) amrex::Amr::defBaseLevel(double, amrex::BoxArray const*, amrex::Vector<int, std::allocator > const*) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_Amr.cpp:2504

14: ./amr2d.gnu.MPI.CUDA.ex(+0x39bc32) [0x561c7980dc32]
amrex::Amr::initialInit(double, double, amrex::BoxArray const*, amrex::Vector<int, std::allocator > const*) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_Amr.cpp:1274
(inlined by) amrex::Amr::init(double, double) at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../../amrex/Src/Amr/AMReX_Amr.cpp:1142

15: ./amr2d.gnu.MPI.CUDA.ex(+0x437bb) [0x561c794b57bb]
main at /home/lli/PR-DNS/IAMR/Exec/eb_run2d/../../Source/main.cpp:96

16: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fa97ec3a0b3]

17: ./amr2d.gnu.MPI.CUDA.ex(+0x4d5de) [0x561c794bf5de]
?? ??:0

Could you please help with this?

Answer 1 · 2022-02-03T20:08:32.000Z

Are you using CUDA 11.6? If so, the issue is probably similar to this AMReX-Codes/amrex#2607. We are still investigating this. For now, maybe you can use a different version of CUDA.

Answer 2 · 2022-02-03T21:56:49.000Z

Thanks for the quick response! I'm using CUDA 11.4, so maybe not the same issue

Answer 3 · 2022-02-03T22:05:11.000Z

Could you try amrex/Tests/Amr/Advection_AmrCore/Exec to see if it works? What GPU are you using? Could you provide the stdout of make so that we can see if the CUDA_ARCH provided to the compiler is consistent with your GPU?

Answer 4 · 2022-02-03T22:32:42.000Z

I compiled amrex/Tests/Amr/Advection_AmrCore/Exec using CUDA, and it runs fine with inputs.
I am using RTX 3090 and sm_86, which should be correct based on NVidia's docs. Since CUDA version works with some inputs, I guess the problem is not CUDA_ARCH. The make command output for eb_run2d is attached
tmp.log
.

Answer 5 · 2022-02-07T15:32:05.000Z

I noticed that you must have local modifications in AMReX (as indicated by "AMReX (21.12-dirty) initialized" in the run output). Could you please try with clean versions of AMReX, AMReX-Hydro, and IAMR. Also note that a newer version of IAMR is not guaranteed to work with an older version of AMReX. I recommend checking out the most recent releases of them both (22.02)

Answer 6 · 2022-02-10T19:11:12.000Z

I noticed that you must have local modifications in AMReX (as indicated by "AMReX (21.12-dirty) initialized" in the run output). Could you please try with clean versions of AMReX, AMReX-Hydro, and IAMR. Also note that a newer version of IAMR is not guaranteed to work with an older version of AMReX. I recommend checking out the most recent releases of them both (22.02)

Thanks for the advice. I updated AMReX, AMReX-Hydro, and IAMR to the latest upstream clean version. However the same problem still persists. Is there any particular setting in these inputs which will cause this error?

Answer 7 · 2022-02-11T14:16:21.000Z

I am unable to reproduce this error with CUDA 11.5. Would it be possible for you to switch to this version?

Answer 8 · 2022-02-11T16:01:39.000Z

I'm also unable to reproduce the error with CUDA 11.4. Could you try with completely new clones of the repos? git can do unexpected things sometimes.

Answer 9 · 2022-05-23T17:37:41.000Z

Closing this since it's been over 3 months since the last comment. @lingda-li please open a new issue if you're still having problems.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.