LLNL/Aluminum

[1.4.1] Tests crash

Closed this issue · 1 comments

===>  Testing for Aluminum-1.4.1
===>   Aluminum-1.4.1 depends on package: cxxopts>0 - found
-- Configuring done (4.9s)
-- Generating done (0.0s)
-- Build files have been written to: /usr/ports/net/aluminum/work/.build
ninja: no work to do.
[  0% 1/1] cd /usr/ports/net/aluminum/work/.build && /usr/local/bin/ctest --force-new-ctest-process
Test project /usr/ports/net/aluminum/work/.build
No tests were found!!!
[yv:33027] *** Process received signal ***
[yv:33027] Signal: Segmentation fault (11)
[yv:33027] Signal code: Address not mapped (1)
[yv:33027] Failing at address: 0x440000c8
[yv:33027] [ 0] 0x826d6762c <pthread_sigmask+0x54c> at /lib/libthr.so.3
[yv:33027] [ 1] 0x826d66bd9 <pthread_setschedparam+0x839> at /lib/libthr.so.3
[yv:33027] [ 2] 0x7ffffffff923 <_fini+0x7fffffdd3aa7> at ???
[yv:33027] [ 3] 0x824332fe8 <MPI_Comm_get_attr+0x58> at /usr/local/mpi/openmpi/lib/libmpi.so.40
[yv:33027] [ 4] 0x821b7a4b2 <_ZN2Al8internal3mpi4initERiRPPci+0x102> at /usr/ports/net/aluminum/work/.build/src/libAl.so.1.4.1
[yv:33027] [ 5] 0x821b7735a <_ZN2Al10InitializeERiRPPci+0x1a> at /usr/ports/net/aluminum/work/.build/src/libAl.so.1.4.1
[yv:33027] [ 6] 0x20d730 <main+0x40> at /usr/ports/net/aluminum/work/.build/test/test_exchange
[yv:33027] *** End of error message ***
*** Signal 11

clang-15
FreeBSD 13.2

Unfortunately, Aluminum's tests are not actually integrated with ctest (for a variety of reasons, primarily related to them all needing MPI to run). I suspect this issue is due to MPI not initializing properly but not throwing an error, and the subsequent call to MPI_Comm_get_attr segfaulting.

It's a bit odd, but sadly not too surprising, that the segfault is occurring inside an MPI call, especially if MPI failed to initialize properly.

Still, Aluminum should probably try to better detect whether MPI has initialized successfully so I will attempt to make it a bit more robust.

If you suspect a deeper issue here, please re-open or make a new issue. Thanks!