libMesh/libmesh

error when compiling with OpenMPI 4.1.4 and gcc-12.3.0

gregfi opened this issue · 11 comments

gregfi commented

Git revision....................... : 5e720a226ba3740dbe92c135936612461f489ada

I get the following error when compiling with OpenMPI 4.1.4 and gcc-12.3.0

  CXX      utilities/src/libtimpi_opt_la-timpi_version.lo
In file included from ../../../../contrib/timpi/src/utilities/include/timpi/timpi_call_mpi.h:29,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/data_type.h:23,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/status.h:23,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/request.h:24,
                 from ../../../../contrib/timpi/src/parallel/src/request.C:19:
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:400:47: error: cast from ‘void*’ is not allowed
  400 | #define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))
      |                                              ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:855:26: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’
  855 | #define MPI_REQUEST_NULL OMPI_PREDEFINED_GLOBAL(MPI_Request, ompi_request_null)
      |                          ^~~~~~~~~~~~~~~~~~~~~~
../../../../contrib/timpi/src/parallel/include/timpi/request.h:112:43: note: in expansion of macro ‘MPI_REQUEST_NULL’
  112 |   static constexpr request null_request = MPI_REQUEST_NULL;
      |                                           ^~~~~~~~~~~~~~~~
Makefile:1209: recipe for target 'parallel/src/libtimpi_opt_la-request.lo' failed
make[3]: *** [parallel/src/libtimpi_opt_la-request.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from ../../../../contrib/timpi/src/utilities/include/timpi/timpi_call_mpi.h:29,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/data_type.h:23,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/standard_type.h:23,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/communicator.h:23,
                 from ../../../../contrib/timpi/src/utilities/src/timpi_assert.C:24:
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:400:47: error: cast from ‘void*’ is not allowed
  400 | #define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))
      |                                              ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:855:26: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’
  855 | #define MPI_REQUEST_NULL OMPI_PREDEFINED_GLOBAL(MPI_Request, ompi_request_null)
      |                          ^~~~~~~~~~~~~~~~~~~~~~
../../../../contrib/timpi/src/parallel/include/timpi/request.h:112:43: note: in expansion of macro ‘MPI_REQUEST_NULL’
  112 |   static constexpr request null_request = MPI_REQUEST_NULL;
      |                                           ^~~~~~~~~~~~~~~~
In file included from ../../../../contrib/timpi/src/parallel/include/timpi/message_tag.h:28,
                 from ../../../../contrib/timpi/src/parallel/src/message_tag.C:19:
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:400:47: error: cast from ‘void*’ is not allowed
  400 | #define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))
      |                                              ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:855:26: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’
  855 | #define MPI_REQUEST_NULL OMPI_PREDEFINED_GLOBAL(MPI_Request, ompi_request_null)
      |                          ^~~~~~~~~~~~~~~~~~~~~~
../../../../contrib/timpi/src/parallel/include/timpi/request.h:112:43: note: in expansion of macro ‘MPI_REQUEST_NULL’
  112 |   static constexpr request null_request = MPI_REQUEST_NULL;
      |                                           ^~~~~~~~~~~~~~~~
In file included from ../../../../contrib/timpi/src/utilities/include/timpi/timpi_call_mpi.h:29,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/data_type.h:23,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/standard_type.h:23,
                 from ../../../../contrib/timpi/src/parallel/include/timpi/communicator.h:23,
                 from ../../../../contrib/timpi/src/parallel/src/communicator.C:20:
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:400:47: error: cast from ‘void*’ is not allowed
  400 | #define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))
      |                                              ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:855:26: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’
  855 | #define MPI_REQUEST_NULL OMPI_PREDEFINED_GLOBAL(MPI_Request, ompi_request_null)
      |                          ^~~~~~~~~~~~~~~~~~~~~~
../../../../contrib/timpi/src/parallel/include/timpi/request.h:112:43: note: in expansion of macro ‘MPI_REQUEST_NULL’
  112 |   static constexpr request null_request = MPI_REQUEST_NULL;
      |                                           ^~~~~~~~~~~~~~~~
In file included from ../../../../contrib/timpi/src/utilities/include/timpi/timpi_init.h:33,
                 from ../../../../contrib/timpi/src/utilities/src/timpi_init.C:20:
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:400:47: error: cast from ‘void*’ is not allowed
  400 | #define OMPI_PREDEFINED_GLOBAL(type, global) (static_cast<type> (static_cast<void *> (&(global))))
      |                                              ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/fischega/src/direwolf/openmpi-4.1.4_gcc/include/mpi.h:855:26: note: in expansion of macro ‘OMPI_PREDEFINED_GLOBAL’
  855 | #define MPI_REQUEST_NULL OMPI_PREDEFINED_GLOBAL(MPI_Request, ompi_request_null)
      |                          ^~~~~~~~~~~~~~~~~~~~~~
../../../../contrib/timpi/src/parallel/include/timpi/request.h:112:43: note: in expansion of macro ‘MPI_REQUEST_NULL’
  112 |   static constexpr request null_request = MPI_REQUEST_NULL;
      |                                           ^~~~~~~~~~~~~~~~

I've attached a copy of libmesh_diagnostic.log.

libmesh_diagnostic.log

A few questions:

Can you even compile a "hello world" program with an MPI_REQUEST_NULL in it? I assume the answer is "yes" but I want to ensure there's no possibility that this is some weird OpenMPI-4.1.4/gcc-12 incompatibility.

If you change that static constexpr to static const, does that fix the compilation? I fear the answer will also be "yes".

If so, we might have gotten too greedy here. Your error message doesn't match what I'm seeing from g++ 11 on some test cases, but there are rules about what you can do with respect to address-taking and casting in a constexpr, and I don't think OpenMPI is obeying them, and the MPI standard of course doesn't make any promises that it will obey them, so we shouldn't be expecting to be able to use constexpr; we've just gotten lucky with MPICH.

Pinging @loganharbour to ask: how hard would it be to get an OpenMPI container set up for a few Civet recipes? I used to use OpenMPI exclusively myself, so it didn't bother me so much that we were only using MPICH for automatic testing, but ever since open-mpi/ompi#10525 hit me last year I've been using PETSc-downloaded MPICH and apparently our compatibility has already started going downhill. The OpenMPI people were super quick about fixing the bug, but I've been too lazy to build a fixed version from scratch; I keep hoping my Linux distribution will finally get it into the repos...

If you change that static constexpr to static const, does that fix the compilation? I fear the answer will also be "yes".

My money is on yes.

I ran into this issue in another context with g++12 and OpenMPI. static const was the easiest fix on our side.

paranumal/hipBone#33

there are rules about what you can do with respect to address-taking and casting in a constexpr, and I don't think OpenMPI is obeying them

100% agree here.

inline static const in the case I referenced.

Pinging @loganharbour to ask: how hard would it be to get an OpenMPI container set up for a few Civet recipes?

Trivial. I need to tag a container for you so that you have a constant version in libmesh, but you should be able to take any of the recipes with moose-[dev, petsc, libmesh]-x86_64 and use moose-[dev, petsc, libmesh]-openmpi-x86_64 instead.

gregfi commented

Thanks for the quick attention!

A few questions:

Can you even compile a "hello world" program with an MPI_REQUEST_NULL in it? I assume the answer is "yes" but I want to ensure there's no possibility that this is some weird OpenMPI-4.1.4/gcc-12 incompatibility.

My C++-fu is practically non-existent. I can try something, but can you post the specific "hello world" you want me to try?

If you change that static constexpr to static const, does that fix the compilation? I fear the answer will also be "yes".

When I change static constexpr to static const in timpi/src/parallel/include/timpi/request.h, I get (abbreviated):

In file included from ../../../../contrib/timpi/src/parallel/src/request.C:19:
../../../../contrib/timpi/src/parallel/include/timpi/request.h:112:24: error: ‘constexpr’ needed for in-class initialization of static data member ‘ompi_request_t* const TIMPI::Request::null_request’ of non-integral type [�]8;;https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-fpermissive-fpermissive�]8;;]

Wait a minute. I just failed to replicate this ... because it got fixed already. In April, in libMesh/TIMPI#123, merged into libMesh in #3539.

Looks like you're using a git revision from a week or two earlier. Update?

gregfi commented

OK. We should probably pull down new versions of the whole nine yards from INL. Let me see if I can figure out how to do that. It may take me a few days.

Whoa! I just realized you're going to need to pull down a new OpenMPI too. You say you're using 4.1.4, but the bugfix for that libMesh-destroying bug went in to 4.1.5.

gregfi commented

I re-compiled with updated libMesh and OpenMPI 4.1.6 and the issue is resolved. Thanks!

You're welcome!

I keep hoping my Linux distribution will finally get it into the repos...

For posterity's sake: it's in, as long as you're willing to upgrade away from the LTS release. Ubuntu 23.10 has OpenMPI 4.1.5.