UO-OACISS/apex

Segfault in APEX 2.5.1 with HPX 1.8.0

severinstrobl opened this issue · 4 comments

The version 2.5.1 compiled by default in HPX 1.8.0 results in a segfault when using the MPI parcelport transport in conjunction with APEX:

#0  MPI_Isend (buf=buf@entry=0x7fffb0018408, count=count@entry=1, datatype=<optimized out>, dest=0, tag=tag@entry=1, comm=0x55555592add0, request=0x7fffb0018678) at _deps/apex-src/src/apex/apex_mpi.cpp:59
#1  0x00007ffff76a7b50 in hpx::parcelset::policies::mpi::receiver_connection<hpx::parcelset::policies::mpi::parcelport>::send_release_tag (this=0x7fffb0018400, num_thread=18446744073709551615)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelport_mpi/include/hpx/parcelport_mpi/receiver_connection.hpp:189
#2  0x00007ffff76a8738 in hpx::parcelset::policies::mpi::receiver_connection<hpx::parcelset::policies::mpi::parcelport>::receive_transmission_chunks (num_thread=18446744073709551615, this=0x7fffb0018400)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelport_mpi/include/hpx/parcelport_mpi/receiver_connection.hpp:118
#3  hpx::parcelset::policies::mpi::receiver_connection<hpx::parcelset::policies::mpi::parcelport>::receive (num_thread=18446744073709551615, this=0x7fffb0018400)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelport_mpi/include/hpx/parcelport_mpi/receiver_connection.hpp:73
#4  hpx::parcelset::policies::mpi::receiver<hpx::parcelset::policies::mpi::parcelport>::receive_messages (connection=..., this=0x555555955e80)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelport_mpi/include/hpx/parcelport_mpi/receiver.hpp:81
#5  hpx::parcelset::policies::mpi::receiver<hpx::parcelset::policies::mpi::parcelport>::background_work (this=this@entry=0x555555955e80)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelport_mpi/include/hpx/parcelport_mpi/receiver.hpp:72
#6  0x00007ffff76a9d4c in hpx::parcelset::policies::mpi::parcelport::background_work (num_thread=3, mode=<optimized out>, this=0x555555955b70)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelport_mpi/src/parcelport_mpi.cpp:195
#7  hpx::parcelset::policies::mpi::parcelport::background_work (mode=<optimized out>, num_thread=3, this=0x555555955b70)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelport_mpi/src/parcelport_mpi.cpp:180
#8  hpx::parcelset::parcelport_impl<hpx::parcelset::policies::mpi::parcelport>::do_background_work_impl (mode=<optimized out>, num_thread=3, this=0x555555955b70)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelset/include/hpx/parcelset/parcelport_impl.hpp:467
#9  hpx::parcelset::parcelport_impl<hpx::parcelset::policies::mpi::parcelport>::do_background_work (this=0x555555955b70, num_thread=3, mode=<optimized out>)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelset/include/hpx/parcelset/parcelport_impl.hpp:326
#10 0x00007ffff76ae9a0 in hpx::parcelset::parcelhandler::do_background_work (this=<optimized out>, num_thread=num_thread@entry=3, stop_buffering=stop_buffering@entry=true, 
    mode=mode@entry=hpx::parcelset::parcelport_background_mode_all) at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/parcelset/src/parcelhandler.cpp:378
#11 0x00007ffff777f186 in hpx::parcelset::do_background_work (num_thread=num_thread@entry=3, mode=mode@entry=hpx::parcelset::parcelport_background_mode_all)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/runtime_distributed/src/runtime_distributed.cpp:1861
#12 0x00007ffff777f1a2 in hpx::detail::network_background_callback (num_thread=3)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/full/runtime_distributed/src/runtime_distributed.cpp:157
#13 0x00007ffff6ad4cba in hpx::util::detail::basic_function<bool (unsigned long), true, false>::operator()(unsigned long) const (vs#0=<optimized out>, this=<optimized out>)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/functional/include/hpx/functional/detail/basic_function.hpp:228
#14 hpx::util::detail::deferred<hpx::function<bool (unsigned long), false>, hpx::util::pack_c<unsigned long, 0ul>, unsigned long>::operator()() (this=<optimized out>)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/functional/include/hpx/functional/deferred_call.hpp:86
#15 hpx::util::detail::callable_vtable<bool ()>::_invoke<hpx::util::detail::deferred<hpx::function<bool (unsigned long), false>, hpx::util::pack_c<unsigned long, 0ul>, unsigned long> >(void*) (f=<optimized out>)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/functional/include/hpx/functional/detail/vtable/callable_vtable.hpp:93
#16 0x00007ffff6ad58ff in hpx::util::detail::basic_function<bool (), false, false>::operator()() const (this=<optimized out>)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/functional/include/hpx/functional/detail/basic_function.hpp:228
#17 hpx::threads::detail::create_background_thread<hpx::threads::policies::local_priority_queue_scheduler<std::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo> >(hpx::threads::policies::local_priority_queue_scheduler<std::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo>&, hpx::threads::detail::scheduling_callbacks&, std::shared_ptr<bool>&, hpx::threads::thread_schedule_hint, long&)::{lambda(hpx::threads::thread_restart_state)#1}::operator()(hpx::threads::thread_restart_state) const (this=0x7fffb000ef30) at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/thread_pools/include/hpx/thread_pools/scheduling_loop.hpp:442
#18 hpx::util::detail::callable_vtable<std::pair<hpx::threads::thread_schedule_state, hpx::threads::thread_id> (hpx::threads::thread_restart_state)>::_invoke<hpx::threads::detail::create_background_thread<hpx::threads::policies::local_priority_queue_scheduler<std::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo> >(hpx::threads::policies::local_priority_queue_scheduler<std::mutex, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_fifo, hpx::threads::policies::lockfree_lifo>&, hpx::threads::detail::scheduling_callbacks&, std::shared_ptr<bool>&, hpx::threads::thread_schedule_hint, long&)::{lambda(hpx::threads::thread_restart_state)#1}>(void*, hpx::threads::thread_restart_state&&) (f=0x7fffb000ef30, vs#0=<optimized out>)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/functional/include/hpx/functional/detail/vtable/callable_vtable.hpp:93
#19 0x00007ffff6a174c1 in hpx::util::detail::basic_function<std::pair<hpx::threads::thread_schedule_state, hpx::threads::thread_id> (hpx::threads::thread_restart_state), false, false>::operator()(hpx::threads::thread_restart_state) const (vs#0=<optimized out>, this=0x7fffb000f310)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/functional/include/hpx/functional/detail/basic_function.hpp:228
#20 hpx::threads::coroutines::detail::coroutine_impl::operator() (this=0x7fffb000f2a8)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/coroutines/src/detail/coroutine_impl.cpp:74
#21 0x00007ffff6a16969 in hpx::threads::coroutines::detail::lx::trampoline<hpx::threads::coroutines::detail::coroutine_impl> (fun=<optimized out>)
    at /tmp/stro_se/spack-stage/spack-stage-hpx-1.8.0-rzoabzjncqvqt65kv4rpuzrnbvra2a25/spack-src/libs/core/coroutines/include/hpx/coroutines/detail/context_linux_x86.hpp:179

The error is triggered when shutting down the HPX runtime via hpx::finalize(). Switching to an older version of APEX (tested with 2.4.1) does not result in a segfault.

I'll see if I can reproduce the issue with any of the 'official' examples.

The issue can be reproduced by running the HPX example from examples/quickstart/partitioned_vector_spmd_foreach.cpp using multiple MPI ranks.

khuck commented

Interesting... line 59 of apex_mpi.cpp is doing a divide to estimate the effective bandwidth, it's possible the profiler object's get_elapsed() is returning zero? Come to think of it, since MPI_Isend is non-blocking, this is a meaningless measurement. Might be better to remove it entirely.

khuck commented

fixed by b647e0e

@khuck Thanks for the quick fix!