open-mpi/ompi

XL Fortran Compiler F08 Bindings broken for OpenSHMEM

jjhursey opened this issue · 7 comments

The README says:

  • IBM's xlf compilers: NO known good version that can build/link
    the MPI f08 bindings or build/link the OpenSHMEM Fortran bindings.

However, this no longer seems to be accurate for the MPI f08 bindings. I can confirm that the OpenSHMEM Fortran bindings do not work.

I checked the master branch with XL compiler:
shell$ xlf -qversion
IBM XL Fortran for Linux, V15.1.5 (5725-C75, 5765-J10)
Version: 15.01.0005.0000
shell$ ./configure --disable-dlopen CC=xlc_r CXX=xlC_r FC=xlf_r
...
shell$ cd examples/
shell$ make ring_usempif08 hello_usempif08
mpifort -g ring_usempif08.f90 -o ring_usempif08
** ring   === End of Compilation 1 ===
1501-510  Compilation successful for file ring_usempif08.f90.
mpifort -g hello_usempif08.f90 -o hello_usempif08
** main   === End of Compilation 1 ===
1501-510  Compilation successful for file hello_usempif08.f90.
shell$ mpifort --showme
xlf_r -I/tmp/jjhursey/install/ompi-master/include -I/tmp/jjhursey/install/ompi-master/lib -Wl,-rpath -Wl,/tmp/jjhursey/install/ompi-master/lib -Wl,--enable-new-dtags -L/tmp/jjhursey/install/ompi-master/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
shell$ mpifort -qversion
IBM XL Fortran for Linux, V15.1.5 (5725-C75, 5765-J10)
Version: 15.01.0005.0000
shell$ mpirun -np 2 ./ring_usempif08
Process 0 sending 10 to  1 tag 201 ( 2 processes in ring)
Process 0 sent to  1
Process 0 decremented value:  9
Process 0 decremented value:  8
Process 0 decremented value:  7
Process 0 decremented value:  6
Process 0 decremented value:  5
Process 0 decremented value:  4
Process 0 decremented value:  3
Process 0 decremented value:  2
Process 0 decremented value:  1
Process 0 decremented value:  0
Process  0 exiting
Process  1 exiting
shell$ mpirun -np 2 ./hello_usempif08
Hello, world, I am  0 of  2: Open MPI v4.0.0a1, package: Open MPI jjhursey@c656f6n03 Distribution, ident: 4.0.0a1, repo rev: v2.x-dev-4209-g22631832, Unreleased developer copy                                                                                                            
Hello, world, I am  1 of  2: Open MPI v4.0.0a1, package: Open MPI jjhursey@c656f6n03 Distribution, ident: 4.0.0a1, repo rev: v2.x-dev-4209-g22631832, Unreleased developer copy                                                                                                            

For OSHMEM it's not quite all working:

shell$ make hello_oshmemfh
shmemfort -g hello_oshmemfh.f90 -o hello_oshmemfh
** hello_oshmem   === End of Compilation 1 ===
1501-510  Compilation successful for file hello_oshmemfh.f90.
shell$ mpirun -np 2 ./hello_oshmemfh
Hello, world, I am  0 of  2: (version: 1.3)
Hello, world, I am  1 of  2: (version: 1.3)
shell$ make ring_oshmemfh
shmemfort -g ring_oshmemfh.f90 -o ring_oshmemfh
** ring_oshmem   === End of Compilation 1 ===
1501-510  Compilation successful for file ring_oshmemfh.f90.
ring_oshmemfh.o: In function `ring_oshmem':
/tmp/jjhursey/ompi/examples/ring_oshmemfh.f90:48: undefined reference to `shmem_int8_wait_until'
make: *** [ring_oshmemfh] Error 1

Note: I wanted to file this bug to make sure folks know that the XL compiler seems to be working fine with master and the MPI bindings. We can update the README to adjust the language, but it might be easy to fix the OpenSHMEM issue and just remove the whole restriction.

I don't have the cycles at the moment for this, but didn't want to lose it.

This error is also present in my testing on OpenPower (ppc64le) with the 20170630-beta release of the XL compilers and Open MPI 3.0.0rc1.

-bash-4.2$ ./INST/bin/mpifort -qversion
IBM XL Fortran for Linux, V16.1 (Beta 6)
Version: 16.01.0000.0000
The license for the ESP version of IBM XL Fortran for Linux, V16.1 (Beta) compiler product will expire in 103 days on Sun Oct 15 01:00:00 2017.
make -C examples/
[...]
shmemfort -g ring_oshmemfh.f90 -o ring_oshmemfh
** ring_oshmem   === End of Compilation 1 ===
1501-510  Compilation successful for file ring_oshmemfh.f90.
ring_oshmemfh.o: In function `ring_oshmem':
/autofs/nccs-svm1_home1/hargrove/OMPI/openmpi-3.0.0rc1-linux-summitdev-xlc/BLD/examples/ring_oshmemfh.f90:48: undefined reference to `shmem_int8_wait_until'
/usr/bin/ld: link errors found, deleting executable `ring_oshmemfh'
/usr/bin/sha1sum: ring_oshmemfh: No such file or directory
make[2]: *** [ring_oshmemfh] Error 1

However, I can also confirm that the MPI bindings are fine.

I have some more data that at least pins down why the link is failing.

Configure with four different compilers on the PPC64LE system shows xlf differs from the other three:

$ grep underscore openmpi-3.0.0rc1-linux-summitdev-*/LOG/configure.log
openmpi-3.0.0rc1-linux-summitdev-enzo/LOG/configure.log:checking  external symbol convention... single underscore
openmpi-3.0.0rc1-linux-summitdev-gcc7/LOG/configure.log:checking  external symbol convention... single underscore
openmpi-3.0.0rc1-linux-summitdev-pgi/LOG/configure.log:checking  external symbol convention... single underscore
openmpi-3.0.0rc1-linux-summitdev-xlc/LOG/configure.log:checking  external symbol convention... no underscore

The object references the following symbols

$ nm -u ring_oshmemfh.o
                 U .TOC.
                 U _xlfBeginIO
                 U _xlfEndIO
                 U _xlfExit
                 U _xlfWriteFmt
                 U my_pe
                 U num_pes
                 U shmem_int8_wait_until
                 U shmem_put8
                 U start_pes

Note that none of these have a trailing underscore, which is consistent with the configure findings.

However, the library contains the following shmem_int8_wait_until symbols:

$ nm liboshmem.so.40| grep \ shmem_int8_wait
0000000000084420 W shmem_int8_wait_
00000000000843c0 W shmem_int8_wait__
00000000000842e0 T shmem_int8_wait_f
0000000000084900 W shmem_int8_wait_until_
00000000000848a0 W shmem_int8_wait_until__
00000000000847c0 T shmem_int8_wait_until_f

Note that the weak symbols all have a trailing underscore.

So, I suspect this should be a simple fix for somebody with the right knowledge of where the underscores are appended.

Unlike the code in ompi/mpi/fortran, it appears that oshmem/shmem/fortran doesn't include lowercase-without-trailing-underscore as a potential FORTAN interface at all!
So, the fix for this is going to be fairly pervasive.

Work-around:

For the moment, configuring with FC="xlf -qextname" to force a trailing underscore appears to work (and also verifies the diagnosis).

@PHHargrove Thanks for tracking that down! I should be able to work up a fix for this in the next day or two.

Humm I took a pass at this and it became a mess. The fortran bindings in oshmem are not as centralized as they are in ompi, so it touches a lot of files. The no underscore version results in a number of duplicate naming issues that need to be worked though. Changing one or two of the interfaces worked for the ring test, but broader application of the technique caused build issues.

I think for v3.0 the best we can do is note the workaround that Paul mentioned earlier, and update the README. Then maybe someone from the oshmem community can take a pass at this again - I'm not sure when I'll get cycles back for it, but would be willing to test and help where able.

I'll work on the README update PR. Then we can keep this ticket open for when the real fix arrives.

I revisited this bug today with the current master branch HEAD and can confirm it is still an issue with an additional complication.

$ xlf -qversion
IBM XL Fortran for Linux, V16.1.1 (5725-C75, 5765-J15)
Version: 16.01.0001.0006

With FC=xlf_r

./configure --disable-dlopen CC=xlc_r CXX=xlC_r FC=xlf_r
make -j 20
make -j 20 install
$ cd examples
$ make ring_oshmemfh
shmemfort -g  ring_oshmemfh.f90  -o ring_oshmemfh
** ring_oshmem   === End of Compilation 1 ===
1501-510  Compilation successful for file ring_oshmemfh.f90.
ring_oshmemfh.o: In function `ring_oshmem':
/tmp/ompi-xl/examples/ring_oshmemfh.f90:48: undefined reference to `shmem_int8_wait_until'
make: *** [ring_oshmemfh] Error 1

With FC="xlf_r -qextname (prior proposed workaround)

./configure --disable-dlopen CC=xlc_r CXX=xlC_r FC="xlf -qextname"
make -j 20
make -j 20 install
$ cd examples
$ make ring_oshmemfh
shmemfort -g  ring_oshmemfh.f90  -o ring_oshmemfh
** ring_oshmem   === End of Compilation 1 ===
1501-510  Compilation successful for file ring_oshmemfh.f90.
ring_oshmemfh.o: In function `ring_oshmem':
/tmp/ompi-xl//examples/ring_oshmemfh.f90:24: undefined reference to `start_pes_'
/tmp/ompi-xl//examples/ring_oshmemfh.f90:25: undefined reference to `my_pe_'
/tmp/ompi-xl//examples/ring_oshmemfh.f90:26: undefined reference to `num_pes_'
/tmp/ompi-xl//examples/ring_oshmemfh.f90:35: undefined reference to `shmem_put8_'
/tmp/ompi-xl//examples/ring_oshmemfh.f90:48: undefined reference to `shmem_int8_wait_until_'
/tmp/ompi-xl//examples/ring_oshmemfh.f90:55: undefined reference to `shmem_put8_'
make: *** [ring_oshmemfh] Error 1