GlobalArrays/ga

undefined reference to pdspev, pdspgv, fmemreq

Closed this issue · 13 comments

When building ga (on a current Debian Linux system, e.g. build log at https://buildd.debian.org/status/fetch.php?pkg=ga&arch=amd64&ver=5.7-4%2Bb1&stamp=1569764063&raw=0 ), the build generates these error messages:

/bin/bash ./libtool  --tag=CC   --mode=link mpicc   -fno-aggressive-loop-optimizations -g -O2     -L/usr/lib -Wl,-z,relro -o global/testing/big.x global/testing/big.o libga.la -lm
libtool: link: mpicc -fno-aggressive-loop-optimizations -g -O2 -Wl,-z -Wl,relro -o global/testing/big.x global/testing/big.o  -L/usr/lib ./.libs/libga.a -lscalapack-openmpi -llapack -lblas /usr/lib/x86_64-linux-gnu/libarmci.a -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/9/../../.. -lgfortran -lquadmath -lm
/usr/bin/ld: ./.libs/libga.a(ga_diag.o): in function `gai_diag_std_':
./global/src/ga_diag.F:196: undefined reference to `fmemreq_'
/usr/bin/ld: ./global/src/ga_diag.F:234: undefined reference to `pdspev_'
/usr/bin/ld: ./.libs/libga.a(ga_diag.o): in function `gai_diag_':
./global/src/ga_diag.F:497: undefined reference to `fmemreq_'
/usr/bin/ld: ./global/src/ga_diag.F:523: undefined reference to `pdspgv_'
/usr/bin/ld: ./.libs/libga.a(ga_diag.o): in function `gai_diag_reuse_':
./global/src/ga_diag.F:838: undefined reference to `fmemreq_'
/usr/bin/ld: ./global/src/ga_diag.F:862: undefined reference to `pdspgv_'
collect2: error: ld returned 1 exit status
make[5]: *** [Makefile:5230: global/testing/big.x] Error 1

Which library do these functions come from (fmemreq, pdspev, pdspgv) ?

There are two classes of problems here:

PeIGS

The first is that you cannot build GA with PeIGS unless you are getting PeIGS from NWChem. I have tried in the past to make PeIGS a standalone library apart from NWChem but was not successful. I'm sure someone could do it, but it is not clear there exists such a person with a willingness to do it.

./.libs/libga.a(ga_diag.o): In function `gai_diag_std_':
ga_diag.F:(.text+0xc65): undefined reference to `fmemreq_'
ga_diag.F:(.text+0xecd): undefined reference to `pdspev_'
./.libs/libga.a(ga_diag.o): In function `gai_diag_':
ga_diag.F:(.text+0x2137): undefined reference to `fmemreq_'
ga_diag.F:(.text+0x240b): undefined reference to `pdspgv_'
./.libs/libga.a(ga_diag.o): In function `gai_diag_reuse_':
ga_diag.F:(.text+0x38e1): undefined reference to `fmemreq_'
ga_diag.F:(.text+0x3bc3): undefined reference to `pdspgv_'

I do not know why you are building GA with PeIGS particularly since you are also enabling ScaLAPACK. While PeIGS is faster than ScaLAPACK in some scenarios, I'm not sure the benefit is worth the price of solving the standalone PeIGS problem.

ARMCI-MPI

The second problem is that GA is using new APIs that are not supported by ARMCI-MPI. This is a real issue.

./.libs/libga.a(base.o): In function `gai_get_devmem':
base.c:(.text+0xdc98): undefined reference to `ARMCI_Malloc_group_memdev'
base.c:(.text+0xdcf9): undefined reference to `ARMCI_Malloc_memdev'
./.libs/libga.a(base.o): In function `pnga_destroy':
base.c:(.text+0x10955): undefined reference to `ARMCI_Free_memdev'

There are two solutions here. One is that GA can disable these APIs when ARMCI-MPI is used, which is probably a good thing anyways since it is possible that only ComEx supports them, in which case the other ARMCI implementation also has an issue here.

The other solution is for ARMCI-MPI to implement these functions. I will be tracking progress on this here: pmodels/armci-mpi#28.

Thanks for the analysis, Jeff.

In practice nwchem is the only package in Debian which currently uses GA, so I figure that's why we haven't seen runtime problems. If I understood your analysis, nwchem is supplying the missing symbols when building /usr/bin/nwchem. We're building libga as a static library. But we can stop enabling PeIGS in GA if you recommend that. Would it interfere with nwchem if we do that?

I'll keep watch on the ARMCI-MPI developments.

Thanks Jeff. Probably to keep the package management simple, I'll leave PeIGS activated in Debian's GA for now, to service nwchem. If another client package for GA shows up then we can make more drastic changes then. I can leave a README note to document the situation. Perhaps someone will show up to work on stand-alone PeIGS in the mean time.

No great rush getting these patches through, go have a good Xmas rest :)

@RizzerOnGitHub I created https://github.com/pmodels/armci-mpi/releases/tag/v0.3.1-beta for you. It has the memdev API stubs required to link GA.

Please create an ARMCI-MPI issue for any follow-up discussion related to this release. Thanks!

The updated armci-mpi builds fine for Debian, https://buildd.debian.org/status/package.php?p=armci-mpi&suite=experimental

GA still gives the fmemreq error,
(as well as the pdspev_ error from PeIGS)

/bin/bash ./libtool  --tag=F77   --mode=link mpif90   -fdefault-integer-8 -fno-aggressive-loop-optimizations -g -O2 -fdebug-prefix-map=/home/drew/projects/debichem/build/ga=. -fstack-protector-strong      -L/usr/lib -Wl,-z,relro -o ma/testf.x ma/testf.o libga.la -lm 
libtool: link: mpif90 -fdefault-integer-8 -fno-aggressive-loop-optimizations -g -O2 -fdebug-prefix-map=/home/drew/projects/debichem/build/ga=. -fstack-protector-strong -Wl,-z -Wl,relro -o ma/testf.x ma/testf.o  -L/usr/lib ./.libs/libga.a -lscalapack-openmpi -llapack -lblas /usr/lib/x86_64-linux-gnu/libarmci.a -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/9/../../.. -lgfortran -lquadmath -lm
mpicc -DHAVE_CONFIG_H -I.       -I/usr/include -Ima -I./ma -I./LinAlg/lapack+blas -Iglobal/src -I./global/src -I./global/testing -I./pario/dra -I./pario/eaf -I./pario/elio -I./pario/sf -I./ga++/src  -Igaf2c -I./
gaf2c -I./tcgmsg -I./tcgmsg/tcgmsg-mpi      -Wdate-time -D_FORTIFY_SOURCE=2   -fno-aggressive-loop-optimizations -g -O2 -c -o global/testing/big.o global/testing/big.c
/bin/bash ./libtool  --tag=CC   --mode=link mpicc   -fno-aggressive-loop-optimizations -g -O2   -L/usr/lib -Wl,-z,relro -o global/testing/big.x global/testing/big.o libga.la -lm 
libtool: link: mpicc -fno-aggressive-loop-optimizations -g -O2 -Wl,-z -Wl,relro -o global/testing/big.x global/testing/big.o  -L/usr/lib ./.libs/libga.a -lscalapack-openmpi -llapack -lblas /usr/lib/x86_64-linux-gnu/libarmci.a -L/usr/lib/gcc/x86_64-linux-gnu/9 -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/9/../../.. -lgfortran -lquadmath -lm
/usr/bin/ld: ./.libs/libga.a(ga_diag.o): in function `gai_diag_std_':
./global/src/ga_diag.F:196: undefined reference to `fmemreq_'

Ok, I think I can add the stubs that will convert compilation errors into runtime errors telling users they need to link against PeIGS from NWChem.

On second thought, I wonder if Debian should just disable PeIGS. @edoapra do you think the cases where PeIGS is faster than ScaLAPACK are important enough to justify working around the spaghetti here?

@jeffhammond I agree with you that disabling PeIGS when ScaLAPACK is available is the right thing to do

If Debian disables PeIGS in GA, will that affect or compromise nwchem in any way? Or does it just mean nwchem will simply use GA+scalapack ?

Thanks Jeff. I'll prepare the Debian package to disable PeIGS then.