eschnett/MPItrampoline

Issues building Global Arrays

Opened this issue · 9 comments

I'm trying to build the Global Arrays library with MPItrampoline and have run into some issues. This package is a dependency for Molpro and several other computational chemistry applications. Any assistance that you can provide to get it working would be greatly appreciated.

The compilation fails at https://github.com/GlobalArrays/ga/blob/f4016b869dfd1a2b2856f74a73dd9452dbbc8ae4/comex/src-mpi-pr/comex.c#L4968
with the error message:

libtool: compile:  mpicc -DHAVE_CONFIG_H -I. -I./src-common -I./src-mpi-pr -g -O2 -MT src-mpi-pr/comex.lo -MD -MP -MF src-mpi-pr/.deps/comex.Tpo -c src-mpi-pr/comex.c -o src-mpi-pr/comex.o
src-mpi-pr/comex.c: In function ‘str_mpi_retval’:
src-mpi-pr/comex.c:4932:9: error: case label does not reduce to an integer constant
         case MPI_SUCCESS       : msg = "MPI_SUCCESS"; break;
         ^

Steps to reproduce:

git clone -b develop https://github.com/GlobalArrays/ga
cd ga
./autogen.sh
./configure --with-mpi-pr --with-blas=no --with-lapack=no --with-scalapack=no --disable-f77
make

I've tried both the develop and master branches.

There are lots of configurations which can be used, we're most interested in the recommended port which uses MPI-1 with progress ranks (--with-mpi-pr) but I tried several other configurations without success.

I'm not sure whether it's an MPItrampoline issue or whether it's non-standard use of MPI within Global Arrays.

I've attached the full output from build
build-ga.txt

I've tried on

  • RHEL 7.9 with gcc 4.8.5
  • Ubuntu 22.04 with gcc 11.3.0

This is a bug in MPItrampoline. MPI return codes are required to be literal integer constants, which means they can be used case-switch statements.

A.1 A.1.1
Defined Values and Handles
Defined Constants

The C and Fortran names are listed below. Constants with the type const int may also be implemented as literal integer constants substituted by the preprocessor.

Error classes

C type: const int (or unnamed enum)
Fortran type: INTEGER

MPI_SUCCESS

I can change GA to work around this, but I'm not going to do that, because the code is conforming to the MPI standard. For contrast, using MPI_INT in a case-switch is not legal and I fix that issue whenever I see it.

I think the fix in MPItrampoline requires a shim that translates to/from the MPI implementation error codes to the ones MPItrampoline uses. I happen to be intimately familiar with this issue, because I need to use it in my mostly-ABI-agnostic MPI F08 module implementation (code).

The good news is the MPI_SUCCESS is required to be zero (0 = MPI_SUCCESS < MPI_T_ERR_XXX ≤ MPI_ERR_LASTCODE from MPI 4.0 §15.3.10) so one can test that for me and only do the translation as a cold code path.

I disagree with Jeff's assessment. While MPI_SUCCESS can be implemented as const int, const int values cannot be used in switch statements in C. (In C++ they could, but const is weaker in C.) In particular, const int values are not necessarily compile-time constants; they might only be defined by the linker.

Here is a brief example.

Please create an MPI issue for this: https://github.com/mpi-forum/mpi-issues/issues.

We currently have example code in the standard using case-switch on one of the literal integer constants (see MPI_TYPE_MATCH_SIZE and Example 5.20), so if you are right, there is a logical inconsistency in the standard.

The following (MPI 4.0 §2.5.4) supports your argument, by the way, and I hope you are right, but it requires an unambiguous resolution.

All named constants, with the exceptions noted below for Fortran, can be used in initialization expressions or assignments, but not necessarily in array declarations or as labels in C switch or Fortran select/case statements. This implies named constants to be link-time but not necessarily compile-time constants.

I'll reopen the GA issue and fix that this week.

The statement you quote from the standard has an issue: "initialization expressions" are very different in C and Fortran. C allows link-time constants here, but Fortran does not. Put differently, the distinction between "initialization expressions" and "switch/case statements" does not exist in Fortran – the requirement that initializations expressions can use these constants requires that they are defined as parameter.

I'll add a comment to the issue mpi-forum/mpi-issues#657 you just created.

Compilation also hits a similar issue for MPI_COMM_NULL, MPI_GROUP_NULL and MPI_REQUEST_NULL

https://github.com/GlobalArrays/ga/blob/f4016b869dfd1a2b2856f74a73dd9452dbbc8ae4/comex/src-mpi-pr/groups.c#L30

libtool: compile:  mpicc -std=gnu11 -DHAVE_CONFIG_H -I. -I./src-common -I./src-mpi-pr -g -O2 -MT src-mpi-pr/groups.lo -MD -MP -MF src-mpi-pr/.deps/groups.Tpo -c src-mpi-pr/groups.c -o src-mpi-pr/groups.o
src-mpi-pr/groups.c:30:5: error: initializer element is not constant
     MPI_COMM_NULL,
     ^
src-mpi-pr/groups.c:30:5: error: (near initialization for ‘g_state.comm’)

2.5.4 Named Constants
MPI also provides predefined named constant handles, such as MPI_COMM_WORLD.
All named constants, with the exceptions noted below for Fortran, can be used in initialization expressions or assignments...

That needs to be fixed in MPItrampoline.

If you want a fix for that one, I can make it as a one-off, but I won't commit it. You can replace the struct initializer with {0} and it should be fine.

If you want a fix for that one, I can make it as a one-off, but I won't commit it. You can replace the struct initializer with {0} and it should be fine.

Thanks, that's a better workaround than the one I used.
We can keep it as a local patch until there's a permanent solution.

I don't know exactly what you end goal is, but to really make GA binaries flexible, one should be able to dynamically swap ARMCI libraries. MPI-PR is optimal some of the time, while MPI-TS is optimal for a few cases, and ARMCI-MPI is optimal for others. There will be an ABI conflict there, but if you were able to build libga.so with each, you would probably have the same GA ABI.