PrincetonUniversity/athena

AMR in GR Does Not Work (Undefined Behavior)

Closed this issue · 6 comments

Prerequisite checklist

Place an X in between the brackets on each line as you complete these checks:

  • Did you check that the issue hasn't already been reported?
  • Did you check the documentation in the Wiki for an answer?
  • Are you running the latest version of Athena++?

Summary of issue

Running the code with GR and AMR without magnetic fields produces artificial clumping behavior in density. Running it with MHD but B=0 everywhere initially fixes that clumping behavior in density but the field becomes nonzero in the same pattern. This seems to happen with MPI and AMR turned on in 3D.

e.g., this is bsq after some time (BH is in the center). AMR levels are concentrated around x=0, z=20rg, where x is horizontal and z is vertical.
Screen Shot 2023-01-10 at 6 57 45 PM

Steps to reproduce
Run a simple bondi-type flow (uniform initial conditions with B=0) with a refinement condition that results in refinement of at least two levels (not sure if this is required). By the first time step nonzero fields arise.

You may want to include small snippets of Athena++ source code to describe the problem in more detail:

# Terminal output may be pasted here

Actual outcome

You may attach a plot here, if it is appropriate.

Additional comments and/or proposed solution
I have tracked down the issue to these two functions (and the 4D versions as well)

namespace BufferUtility {
//----------------------------------------------------------------------------------------
//! \fn template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
//!     int sn, int en, int si, int ei, int sj, int ej, int sk, int ek, int &offset)
//! \brief pack a 4D AthenaArray into a one-dimensional buffer

template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
         int sn, int en, int si, int ei, int sj, int ej, int sk, int ek, int &offset) {
  for (int n=sn; n<=en; ++n) {
    for (int k=sk; k<=ek; k++) {
      for (int j=sj; j<=ej; j++) {
#pragma omp simd
        for (int i=si; i<=ei; i++)
          buf[offset++] = src(n,k,j,i);
      }
    }
  }
}

//----------------------------------------------------------------------------------------
//! \fn template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
//!                     int si, int ei, int sj, int ej, int sk, int ek, int &offset)
//! \brief pack a 3D AthenaArray into a one-dimensional buffer

template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
                                    int si, int ei, int sj, int ej, int sk, int ek,
                                    int &offset) {
  for (int k=sk; k<=ek; k++) {
    for (int j=sj; j<=ej; j++) {
#pragma omp simd
      for (int i=si; i<=ei; i++)
        buf[offset++] = src(k, j, i);
    }
  }
  return;
}

For some reason the integer offset is correctly calculated inside these functions but is occasionally returned as an incorrect value (one less than the correct value). As a result, the int "ssize" in the following section is incorrect and CopyVariableBufferSameProcess leaves off the last one or two elements of data. So some cells in the boundary are not set properly.

void BoundaryVariable::SendBoundaryBuffers() {
  MeshBlock *pmb = pmy_block_;
  int mylevel = pmb->loc.level;
  for (int n=0; n<pbval_->nneighbor; n++) {
    NeighborBlock& nb = pbval_->neighbor[n];
    if (bd_var_.sflag[nb.bufid] == BoundaryStatus::completed) continue;
    int ssize;
    if (nb.snb.level == mylevel)
      ssize = LoadBoundaryBufferSameLevel(bd_var_.send[nb.bufid], nb);
    else if (nb.snb.level<mylevel)
      ssize = LoadBoundaryBufferToCoarser(bd_var_.send[nb.bufid], nb);
    else
      ssize = LoadBoundaryBufferToFiner(bd_var_.send[nb.bufid], nb);
    if (nb.snb.rank == Globals::my_rank) {  // on the same process
      CopyVariableBufferSameProcess(nb, ssize);
    }
#ifdef MPI_PARALLEL
    else  // MPI
      MPI_Start(&(bd_var_.req_send[nb.bufid]));
#endif
    bd_var_.sflag[nb.bufid] = BoundaryStatus::completed;
  }
  return;
}

I think this is clearly a result of undefined behavior but I have not been able to track down the root cause. I can kluge it by replacing ssize with correct value calculated from loop limits but that is not very satisfying. The only thing I can think of is that the buffers are not allocated properly or something but I can't find evidence of that. I have no idea why this only becomes a problem with AMR and GR, etc. The weird behavior of the variable "offset" seems to occur robustly, so I don't know why it doesn't cause an issue in all configurations. This bug has been driving me crazy. Any ideas?

Confession: I have been using an old version of athena++ and that was what I have been using for debugging purposes, but I checked that this happens in the latest public version.

Those #pragma omp simd statements are suspiciously close to the problem. If you remove those, does anything change?

Dang it, yes! It seems to work for the first time step at least. I will see what happens when I run the simulation longer.

Do you know why simd it might be causing this?

I actually wouldn't have guessed that those loops were safe to force vectorize. Whether or not they were, forcing simd has proven to be a bit buggy on more than one occasion.

If those data packing routines were actually a performance bottleneck (which I doubt), it would be easy enough to replace the i-loops with memmove instructions, which would be unbeatable in terms of performance.

I agree with @c-white that this pragma simd is not crucial for performance. Modern compilers should be smart enough to do it right.

However, this pragma has been there almost from the beginning of the history of Athena++. @smressle , can you tell us what is your compiler including its version and OS?

I can confirm that removing the simd commands seems to have fixed the issue. Thanks! Not sure yet if performance has been affected very much or not but I didn't notice a big change.

I am using

icpc (ICC) 19.1.3.304 20200925

on
Operating System: CentOS Linux 8 (Core)
CPE OS Name: cpe:/o:centos:centos:8
Kernel: Linux 4.18.0-193.28.1.el8_2.x86_64
Architecture: x86-64

I have been using this to run simulations for a while and this is the first time I have had any obvious error like this arise (and it is also the first time I have tried using AMR). I am very confused about why it only causes a problem for the GR + AMR combination (it seemed also to affect MHD + AMR runs slightly for me), but that could explain why it hasn't made itself known before.

As the offending simd pragmas were removed when radiation was merged last year, I'm marking this as completed.