SWIFTSIM/SWIFT

Can't compile swift with MPI VELOCIraptor

Findus23 opened this issue · 10 comments

Hello,

I am trying to compile SWIFT with VELOCIraptor and following https://swift.dur.ac.uk/docs/VELOCIraptorInterface/stfwithswift.html it seems rather straightforward. Nevertheless I fail, no matter which combination of options I try.
I am using a regular desktop PC using Debian Testing.

The steps to reproduce from fresh clones are:

git clone https://github.com/ICRAR/VELOCIraptor-STF.git
cd VELOCIraptor-STF
git rev-parse HEAD # returns dc6d330eef60b7ca10e029d9a9af434454575daa
mkdir build-sp
cd build-sp
cmake ../ -DVR_USE_HYDRO=ON -DVR_USE_SWIFT_INTERFACE=ON -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DVR_MPI=OFF
make
cd ..
mkdir build-mp
cd build-mp
cmake ../ -DVR_USE_HYDRO=ON -DVR_USE_SWIFT_INTERFACE=ON -DCMAKE_CXX_FLAGS="-fPIC" -DCMAKE_BUILD_TYPE=Release -DVR_MPI=ON
make
cd ../../
git clone https://gitlab.cosma.dur.ac.uk/swift/swiftsim.git
cd swiftsim
git rev-parse HEAD # returns 25a7aaa4cb35c42cbee9e7ae78c48eb10a7844c5
./autogen.sh
autoreconf --version # returns autoreconf (GNU Autoconf) 2.71
./configure --enable-fof --with-velociraptor=/home/lukas/git/VELOCIraptor-STF/build-sp/src --with-velociraptor-mpi=/home/lukas/git/VELOCIraptor-STF/build-mp/src
make

The compilation then halts with this MPI error in libvelociraptor.a:

libtool: link: mpicc -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial -fopenmp -DWITH_MPI "-DENGINE_POLICY=engine_policy_keep | engine_policy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -funroll-loops -march=amdfam10 -mavx2 -pthread -fopenmp -fopenmp -Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -o swift_mpi swift_mpi-main.o  -L/usr/lib/x86_64-linux-gnu/hdf5/serial ../src/.libs/libswiftsim_mpi.a ../argparse/.libs/libargparse.a -L/home/lukas/git/VELOCIraptor-STF/build-mp/src -lvelociraptor -lmpi -lstdc++ -lgsl -lgslcblas -lhdf5_hl -lhdf5 -lcrypto -lcurl -lsz -lz -ldl -lfftw3_threads -lfftw3 -lnuma -lpthread -lm -pthread -fopenmp
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Op::Init(void (*)(void const*, void*, int, MPI::Datatype const&), bool)':
swiftinterface.cxx:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[_ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb]+0x19): undefined reference to `ompi_mpi_cxx_op_intercept'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Intracomm::Clone() const':
swiftinterface.cxx:(.text._ZNK3MPI9Intracomm5CloneEv[_ZNK3MPI9Intracomm5CloneEv]+0x2c): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Graphcomm::Clone() const':
swiftinterface.cxx:(.text._ZNK3MPI9Graphcomm5CloneEv[_ZNK3MPI9Graphcomm5CloneEv]+0x27): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Cartcomm::Sub(bool const*) const':
swiftinterface.cxx:(.text._ZNK3MPI8Cartcomm3SubEPKb[_ZNK3MPI8Cartcomm3SubEPKb]+0x7e): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Intracomm::Create_graph(int, int const*, int const*, bool) const':
swiftinterface.cxx:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[_ZNK3MPI9Intracomm12Create_graphEiPKiS2_b]+0x2e): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o): in function `MPI::Cartcomm::Clone() const':
swiftinterface.cxx:(.text._ZNK3MPI8Cartcomm5CloneEv[_ZNK3MPI8Cartcomm5CloneEv]+0x27): undefined reference to `MPI::Comm::Comm()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o):swiftinterface.cxx:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[_ZNK3MPI9Intracomm11Create_cartEiPKiPKbb]+0x93): more undefined references to `MPI::Comm::Comm()' follow
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o):(.data.rel.ro._ZTVN3MPI8DatatypeE[_ZTVN3MPI8DatatypeE]+0x78): undefined reference to `MPI::Datatype::Free()'
/usr/bin/ld: /home/lukas/git/VELOCIraptor-STF/build-mp/src/libvelociraptor.a(swiftinterface.cxx.o):(.data.rel.ro._ZTVN3MPI3WinE[_ZTVN3MPI3WinE]+0x48): undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:800: swift_mpi] Fehler 1
make[2]: Verzeichnis „/home/lukas/tmp/swiftsim/examples“ wird verlassen
make[1]: *** [Makefile:525: all-recursive] Fehler 1
make[1]: Verzeichnis „/home/lukas/tmp/swiftsim“ wird verlassen
make: *** [Makefile:457: all] Fehler 2

(in case you need the full log or any other additional information, I can share it too)

I'm not the biggest expert on openMPI, but everything obvious seems correct to me:

  • it is using mpicc for compilation
  • it is linking mpi (-lmpi)
  • build-mp/src/libvelociraptor.a was built with mpi enabled
  • swift without mpi compiled correctly at this point
  • there is only one version of openmpi installed on my computer (the one from libopenmpi-dev (4.1.2-2))

Nevertheless I don't doubt that I could be missing something obvious (that could then maybe also be added to the docs)

I also found https://gitlab.cosma.dur.ac.uk/swift/swiftsim/-/issues/780, so I am wondering if maybe something broke with that PR in the setup explained in the docs.

Hi, that all looks like issues with linking against C++ using a C compiler. Not sure why that part has changed (nothing to do
with #780). Try the following:

cd examples
make MPICC=mpicxx CC=mpicxx

I expect that will work. Not sure how we can fix this permanently.

Many thanks for the response. I assume you mean setting MPICC=mpicxx CC=mpicxx for the swift build (not VELOCIraptor).

In that case (no matter if inside of examples or not) I get a lot of errors that look like even more C++/C mixup:

➜  ~/swiftsim/examples LANG=C make -j MPICC=mpicxx CC=mpicxx
mpicxx -DHAVE_CONFIG_H -I. -I..     -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial     -fopenmp  -DENGINE_POLICY="engine_policy_keep | engine_policy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -funroll-loops -march=amdfam10 -mavx2 -pthread -fopenmp -fopenmp -Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -MT swift-main.o -MD -MP -MF .deps/swift-main.Tpo -c -o swift-main.o `test -f 'main.c' || echo './'`main.c
make: *** No rule to make target '../src/.libs/libswiftsim.a', needed by 'swift'.  Stop.
make: *** Waiting for unfinished jobs....
cc1plus: error: command-line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ [-Werror]
In file included from ../src/kernel_hydro.h:38,
                 from ../src/cell.h:41,
                 from ../src/active.h:26,
                 from ../src/swift.h:26,
                 from main.c:45:
../src/dimension.h: In function 'vector pow_dimension_vec(vector)':
../src/dimension.h:355:50: error: no matching function for call to 'vector::vector(__m256)'
  355 |   return (vector)(vec_mul(vec_mul(x.v, x.v), x.v));
      |                                                  ^
In file included from ../src/dimension.h:33,
                 from ../src/kernel_hydro.h:38,
                 from ../src/cell.h:41,
                 from ../src/active.h:26,
                 from ../src/swift.h:26,
                 from main.c:45:
[...]
../src/cache.h:1010:34: error: narrowing conversion of '-((2.0e+0 * ((double)((const cell*)cj)->cell::width[2])) + ((double)max_dx))' from 'double' to 'float' [-Werror=narrowing]
 1010 |                                  -(2. * cj->width[2] + max_dx)};
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/cell.h:41,
                 from ../src/active.h:26,
                 from ../src/swift.h:26,
                 from main.c:45:
../src/kernel_hydro.h: At global scope:
../src/kernel_hydro.h:479:21: error: 'cubic_1_dwdx_const_c2' defined but not used [-Werror=unused-variable]
  479 | static const vector cubic_1_dwdx_const_c2 = FILL_VEC(0.f);
      |                     ^~~~~~~~~~~~~~~~~~~~~
../src/kernel_hydro.h:475:21: error: 'cubic_1_const_c2' defined but not used [-Werror=unused-variable]
  475 | static const vector cubic_1_const_c2 = FILL_VEC(0.f);
      |                     ^~~~~~~~~~~~~~~~
../src/kernel_hydro.h:447:21: error: 'kernel_ivals_vec' defined but not used [-Werror=unused-variable]
  447 | static const vector kernel_ivals_vec = FILL_VEC((float)kernel_ivals);
      |                     ^~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make: *** [Makefile:872: swift-main.o] Error 1

Yes, just in the examples directory of SWIFT. So build normally until you get to the error you report above, then cd into examples and run make with CC and MPICC re-defined.

Looks like you haven't built the rest of SWIFT first, that doesn't work with C++ as the compiler, unless you disable
the hand vectorization and stop compiler warnings being errors, and may not be 100% happy then.

BTW, this is all I see when I do this:

> make MPICC=mpicxx CC=mpicxx
/bin/bash ../libtool  --tag=CC   --mode=link mpicxx  -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial   E_POLICY="engine_policy_keep | engine_policy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffalake -mavx2 -pthread -fopenmp -fopenmp -Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -L/usr/llpthread -lpthread -lm  -o swift_mpi swift_mpi-main.o ../src/.libs/libswiftsim_mpi.a ../argparse/.libs/libargparse.a    t-tests/temp/VELOCIraptor-STF/build-mp/src -lvelociraptor -lmpi -lstdc++ -lhdf5 -lgsl -lgslcblas -lhdf5_hl -lhdf5  -lpthreads -lfftw3 -lnuma        -lpthread -lm 
libtool: link: mpicxx -I../src -I../argparse -I/usr/include -I/usr/include/hdf5/serial -fopenmp -DWITH_MPI "-DENGINE_POLolicy_setaffinity" -O3 -fomit-frame-pointer -malign-double -fstrict-aliasing -ffast-math -funroll-loops -march=skylake --Wall -Wextra -Wno-unused-parameter -Wshadow -Werror -Wstrict-prototypes -o swift_mpi swift_mpi-main.o  -L/usr/lib/x86_6ibs/libswiftsim_mpi.a ../argparse/.libs/libargparse.a -L../..//swift-tests/temp/VELOCIraptor-STF/builtdc++ -lgsl -lgslcblas -lhdf5_hl -lhdf5 -lsz -lz -ldl -lfftw3_threads -lfftw3 -lnuma -lpthread -lm -pthread -fopenmp

so only the linking part is done using C++.

Ah, I misunderstood you comment. Indeed if I run

cd examples
make MPICC=mpicxx CC=mpicxx

after the aborted main make run (so everything from the first post), it works and looks as it does for you.

So this workaround works fine for me, thanks.

Good, seems this is all caused by pulling in the C++ interface of OpenMPI, which doesn't have C linkage so requires
that the C++ compiler does the linking.

@jchelly you are the most likely to have tried this before. Is this new or have we never built against the MPI version
of VR before?

This has definitely worked in the past on Cosma with the Intel 2018 compiler because I have an EAGLE-XL L0075N1128 run with velociraptor which completed. I haven't tried it since the separate MPI/no MPI configure options were added.

When we were trying to run EAGLE-XL on Irene I had to use a few extra flags. From the notes I put on the gitlab wiki:

# Ensure we get the right compiler run time libraries
export LDFLAGS="-L${MPI_ROOT}/lib/ -L${C_INTEL_ROOT}/lib/intel64/ -cxxlib"

# If IPO is enabled we need to link all dependencies explicitly
export LIBS="-lopen-rte -lopen-pal -lmpi_cxx"

So it did need a bit of help finding the C++ MPI library.

Thanks. I expect the most important part is the -lmpi_cxx part. So this also works:

export LIBS="-lmpi++"
./configure ...

I've updated the documentation to include advice to also include the C++ MPI library if symbols like
these are reported as missing. Please close this issue if you are happy now. Thanks for the report.

Indeed, with LIBS=-lmpi++ it's working exactly as expected.
Thanks for the help!