parthenon-hpc-lab/parthenon

Simple `bool` in `Params` prevents restarts

pgrete opened this issue · 2 comments

@BenWibking discovered an interesting bug (already mentioned on Matrix).

We don't quite understand why/how this could happens.

Upon restart the code fails with

### PARTHENON ERROR
  Condition:   rank == view_rank
  Message:     input and output view are same rank
  File:        /home/pgrete/src/athenapk/external/parthenon/src/outputs/parthenon_hdf5.hpp
  Line number: 203
input and output view are same rank

Adding some debug info

  std::cout << "Reading var " << name << " with rank " << view_rank << " and expected rank " << rank << std::endl;

right in front of the failing lines results in

Reading var Hydro/vertical_driving_only with rank 1 and expected rank 0

That param is a simple bool

    auto vertical_driving_only =
        pin->GetOrAddBoolean("precipitator/driving", "vertical_driving_only", false);
    hydro_pkg->AddParam("vertical_driving_only", vertical_driving_only, parthenon::Params::Mutability::Restart);

with no additional modifications of that param in the codebase.

Also it seems to be properly stored:

   ATTRIBUTE "Hydro/vertical_driving_only" {
      DATATYPE  H5T_STD_U8LE
      DATASPACE  SCALAR
      DATA {
      (0): 0
      }
   }

Commenting the AddParam results in restart that don't crash anymore.

I don't understand how that's happening. We have other bools stored in Params that do not cause issues...

Any ideas? (particularly @Yurlungur as you're also looking at outputs)

Bonus point: the HDF5ReadAttribute call is in principle template <typename T, REQUIRES(implements<kokkos_view(T)>::value)> so that makes even less sense...

This sounds like some of the SFINAE in the HDF5ReadAttribute code is broken... I will try to debug it... will probably put fix in the same branch.