Segmentation fault when using AMR and passive scalars with user-defined boundary functions

Prerequisite checklist

Place an X in between the brackets on each line as you complete these checks:

Did you check that the issue hasn't already been reported?
Did you check the documentation in the Wiki for an answer?
Are you running the latest version of Athena++?

Summary of issue
I am running my own problem generator (based on code by @c-white) where supernova ejecta enters the box (as a user boundary condition) and interacts with a companion star. When running with AMR, the code exits with segmentation fault (with the input provided, at cycle 332, code time 0.38). I am relatively new to Athena++ so any help/pointers are greatly appreciated.

Steps to reproduce

Configure:
python configure.py --prob test_eos_ejecta3 --coord cartesian --flux hllc --nghost 4 --grav mg -mpi -hdf5

Compile and run:
make clean; make
mpirun -n 40 bin/athena -i inputs/model.athinput time/tlim=2.0

Input files (placed in input folder, and please remove .txt) :
donor_1p0_0p21_4p0_0.08.data.txt,
model.athinput.txt

Version info

Athena++ version: 24.0
Compiler and version: g++ 11.4.0
Operating system: Rocky Linux 8.10
Hardware and cluster name (if applicable):
External library versions (if applicable): openmpi/4.0.7 , hdf5/mpi-1.10.9

We cannot tell what is causing the problem without seeing your code. I suggest you to run the code with gdb (or analyze dumped core file with it) to identify where it died. It is a bit tricky to run gdb with MPI but you can google it.

My apologies, I forgot to attach my problem generator
test_eos_ejecta3.cpp.txt

I will look into running gdb with MPI. Thank you for the suggestion.

I could not catch anything causing the segmentation fault, but I'm afraid that your boundary conditions probably cause another problems. You are directly accessing the passive scalar array in the boundary functions, but with AMR we need to apply the boundary conditions on what we call coarser buffer for AMR prolongation.

@c-white @felker do you remember the correct way to apply the boundary conditions on the scalar variables?

https://github.com/PrincetonUniversity/athena/wiki/Passive-Scalars#compatibility-with-other-code-features

However, User-Defined Boundary Conditions are currently unsupported for NSCALARS > 0 since there is no AthenaArray<Real> &r parameter in the function signature

This cannot be hacked in the code in the way shown in the attached pgen file.

To elaborate, the user-defined boundary functions get called during the prolongation step of refinement in ApplyPhysicalBoundariesOnCoarseLevel(), under function callstacks at sites like this one:

athena/src/bvals/bvals_refine.cpp

Lines 451 to 459 in 185473d

    
           if (nb.ni.ox1 == 0) { 
        
             if (apply_bndry_fn_[BoundaryFace::inner_x1]) { 
        
               DispatchBoundaryFunctions(pmb, pmr->pcoarsec, time, dt, 
        
                                         pmb->cis, pmb->cie, sj, ej, sk, ek, 1, 
        
                                         ph->coarse_prim_, pf->coarse_b_, 
        
                                         pnrrad->coarse_ir_, pcr->coarse_cr_, 
        
                                         BoundaryFace::inner_x1, 
        
                                         bvars_subset); 
        
             }

You'll note that ph->coarse_prim_ and other refinement-specific variable buffers are what are being used here / what the user-defined boundary condition is being applied to, not always ph->w. So that is why hardcoding in your boundary condition functions like:

	  AthenaArray<Real> &prim_scalar = pmb->pscalars->r;

	  prim_scalar(0,k,j,i) = 0.0;
	  prim_scalar(1,k,j,i) = 0.0;

won't work. The function needs to be made generic enough to apply to a function parameter for e.g. ps->coarse_r_.

Or you can follow @yanfeij's lead in #492 and have separate user-defined boundary functions for passive scalars like he made for the radiation intensity:

athena/src/bvals/bvals.cpp

Lines 667 to 690 in 185473d

    
           void BoundaryValues::DispatchBoundaryFunctions( 
        
               MeshBlock *pmb, Coordinates *pco, Real time, Real dt, 
        
               int il, int iu, int jl, int ju, int kl, int ku, int ngh, 
        
               AthenaArray<Real> &prim, FaceField &b, AthenaArray<Real> &ir, 
        
               AthenaArray<Real> &u_cr,  BoundaryFace face, 
        
               std::vector<BoundaryVariable *> bvars_subset) { 
        
             if (block_bcs[face] ==  BoundaryFlag::user) {  // user-enrolled BCs 
        
               pmy_mesh_->BoundaryFunction_[face](pmb, pco, prim, b, time, dt, 
        
                                                  il, iu, jl, ju, kl, ku, NGHOST); 
        
               // user-defined  boundary for radiation 
        
               if ((NR_RADIATION_ENABLED || IM_RADIATION_ENABLED)) { 
        
                 pmy_mesh_->RadBoundaryFunc_[face](pmb,pco,pmb->pnrrad,prim,b, ir,time,dt, 
        
                                                        il,iu,jl,ju,kl,ku,NGHOST); 
        
               } 
        
               if (CR_ENABLED) { 
        
                 pmy_mesh_->CRBoundaryFunc_[face](pmb,pco,pmb->pcr,prim, b, u_cr,time,dt, 
        
                                                         il,iu,jl,ju,kl,ku,NGHOST); 
        
               } 
        
             } 
        
             // KGF: this is only to silence the compiler -Wswitch warnings about not handling the 
        
             // "undef" case when considering all possible BoundaryFace enumerator values. If "undef" 
        
             // is actually passed to this function, it will likely die before that ATHENA_ERROR()

Since your user-defined boundary functions are mostly outflow, I would try hardcoding calling the built-in outflow functions for only the passive scalars in void BoundaryValues::DispatchBoundaryFunctions and call your user-defined function on the other variables.

Thanks @tomidakn and @felker -- this was sort of working with an earlier version of the codebase, but perhaps that was just luck. At least there are a couple ways forward for fixing the passive scalar boundaries. @sunnywong314 I can help pursuing one of them. For this project, I'm inclined to do some quick and dirty pointer comparisons, so that only the pgen file needs to be modified, but I'll spend a little time having a closer look at the code.

@tomidakn @felker @c-white Many thanks for looking into this!

Passive scalars didn't cause the segmentation fault, but it is good to know that the hack in the boundary function doesn't work.

I removed all passive-scalar-related lines from the problem generator for clarity :
test_eos_ejecta3.cpp.txt
and scaled down the problem so that it runs faster
model.athinput.txt

I get a segmentation fault if I configure with :
python configure.py --prob test_eos_ejecta3 --coord cartesian --flux hllc --nghost 4 --grav mg -mpi
make clean; make
and run with :
mpirun -n 20 bin/athena -i inputs/model.athinput time/tlim=2

However, if I configure without MPI:
python configure.py --prob test_eos_ejecta3 --coord cartesian --flux hllc --nghost 4 --grav mg
and run with bin/athena -i inputs/model.athinput
then the segmentation fault goes away.

The segmentation fault also goes away if I configure with the -debug option with MPI still on:
python configure.py --prob test_eos_ejecta3 --coord cartesian --flux hllc --nghost 4 --grav mg -mpi -debug
and run with mpirun -n 20 bin/athena -i inputs/model.athinput time/tlim=2

I tracked down 9d763ac as the first commit that gave me the segmentation fault. All earlier commits that I tested up to 2bd7c69 from Mar 2021 were alright.

I haven't learned how to run a debugger with MPI so I don't know which line of the code gave me the segmentation fault.

Here are the modules I have:

modules/2.2-20230808 (S) 2) slurm (S) 3) gcc/11.4.0 4) openmpi/4.0.7 5) hdf5/mpi-1.10.9

The modules are the same at compile time and at run time.

mpicxx --version :
g++ (Spack GCC) 11.4.0

OK, it sounds like my fault. I'll take a look.

Can you try it with nghost=2?

nghost = 2 still gives the segmentation fault (note: previous runs used xorder = 3, and for this I change to xorder = 2)

I could reproduce your issue with g++ (8.5.0) + Intel MPI, but not with icpc (2023) + Intel MPI. So this issue seems to be g++ specific.

@sunnywong314 To try this on Popeye:

module load modules/2.3-20240529 intel-oneapi-compilers/2024.1.0 intel-oneapi-mpi/intel-2021.12.0 hdf5/intel-mpi-1.14.3
python configure.py --prob test_eos_ejecta3 --coord cartesian --flux hllc --nghost 4 --grav mg --cxx icpc -mpi -hdf5 --mpiccmd mpiicpx

In your submission script, try either srun or mpirun. Hopefully this runs smoothly, and it might even run faster.

I tested the latest code with g++ and Intel MPI but with another pgen and it ran smoothly. So I'm afraid this issue is very subtle but specific to your pgen.

	if (nb.ni.ox1 == 0) {
	if (apply_bndry_fn_[BoundaryFace::inner_x1]) {
	DispatchBoundaryFunctions(pmb, pmr->pcoarsec, time, dt,
	pmb->cis, pmb->cie, sj, ej, sk, ek, 1,
	ph->coarse_prim_, pf->coarse_b_,
	pnrrad->coarse_ir_, pcr->coarse_cr_,
	BoundaryFace::inner_x1,
	bvars_subset);
	}

	void BoundaryValues::DispatchBoundaryFunctions(
	MeshBlock pmb, Coordinates pco, Real time, Real dt,
	int il, int iu, int jl, int ju, int kl, int ku, int ngh,
	AthenaArray<Real> &prim, FaceField &b, AthenaArray<Real> &ir,
	AthenaArray<Real> &u_cr, BoundaryFace face,
	std::vector<BoundaryVariable *> bvars_subset) {
	if (block_bcs[face] == BoundaryFlag::user) { // user-enrolled BCs
	pmy_mesh_->BoundaryFunction_[face](pmb, pco, prim, b, time, dt,
	il, iu, jl, ju, kl, ku, NGHOST);

	// user-defined boundary for radiation
	if ((NR_RADIATION_ENABLED \|\| IM_RADIATION_ENABLED)) {
	pmy_mesh_->RadBoundaryFunc_[face](pmb,pco,pmb->pnrrad,prim,b, ir,time,dt,
	il,iu,jl,ju,kl,ku,NGHOST);
	}

	if (CR_ENABLED) {
	pmy_mesh_->CRBoundaryFunc_[face](pmb,pco,pmb->pcr,prim, b, u_cr,time,dt,
	il,iu,jl,ju,kl,ku,NGHOST);
	}
	}
	// KGF: this is only to silence the compiler -Wswitch warnings about not handling the
	// "undef" case when considering all possible BoundaryFace enumerator values. If "undef"
	// is actually passed to this function, it will likely die before that ATHENA_ERROR()