Served: Stability issues upon Boost ASIO classes destructors
Opened this issue · 2 comments
Served server generates random segmentation faults (SIGSEGV) upon cleanups,
due to race conditions throughout the Boost ASIO io_service destructors.
The issue can be proven with the new "stability" example sources (PR #53),
iterating through 1000 creations of served::net::server
with explicit stop()
calls
and through 1000 creations of served::net::server
without calling stop()
.
The issue can be easily reproduced in CentOS 7.6.1810 or Debian 10 Stable Docker images :
docker run --rm -i --cap-add=SYS_PTRACE -v "${PWD}/..:${PWD}/.." -w "${PWD}" debian:stable bash <<EOF
rm -rf ../served.debian
mkdir -p ../served.debian/
cd ../served.debian/
apt update
apt install -y cmake g++ gdb libboost-dev libboost-system-dev ragel
cmake -DCMAKE_BUILD_TYPE=Debug -DSERVED_BUILD_SHARED=ON -DSERVED_BUILD_STATIC=ON -DSERVED_BUILD_EXAMPLES=ON ../served/
make -j8
gdb -q --batch -ex 'set print thread-events off' -ex 'run' -ex 'bt' ../served/bin/eg_stability
EOF
docker run --rm -i --cap-add=SYS_PTRACE -v "${PWD}/..:${PWD}/.." -w "${PWD}" centos:7.6.1810 bash <<EOF
rm -rf ../served.centos
mkdir -p ../served.centos/
cd ../served.centos/
yum install -y epel-release
yum install -y boost-devel cmake gcc-c++ gdb make ragel
cmake -DCMAKE_BUILD_TYPE=Debug -DSERVED_BUILD_SHARED=ON -DSERVED_BUILD_STATIC=ON -DSERVED_BUILD_EXAMPLES=ON ../served/
make -j8
gdb -q --batch -ex 'set print thread-events off' -ex 'run' -ex 'bt' ../served/bin/eg_stability
EOF
The random segmentation fault's backtrace are :
Thread 1 "eg_stability" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
#0 0x0000000000000000 in ?? ()
#1 0x00007efebee6c392 in boost::asio::detail::scheduler_operation::destroy (this=0x5601ebed3520) at /usr/include/boost/asio/detail/scheduler_operation.hpp:45
#2 0x00007efebee748b1 in boost::asio::detail::op_queue_access::destroy<boost::asio::detail::scheduler_operation> (o=0x5601ebed3520) at /usr/include/boost/asio/detail/op_queue.hpp:47
#3 0x00007efebee727bc in boost::asio::detail::op_queue<boost::asio::detail::scheduler_operation>::~op_queue (this=0x5601ebed3548, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/op_queue.hpp:81
#4 0x00007efebee72d20 in boost::asio::detail::scheduler::~scheduler (this=0x5601ebed3470, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/scheduler.hpp:38
#5 0x00007efebee72d7a in boost::asio::detail::scheduler::~scheduler (this=0x5601ebed3470, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/scheduler.hpp:38
#6 0x00005601eaa18226 in boost::asio::detail::service_registry::destroy (service=0x5601ebed3470) at /usr/include/boost/asio/detail/impl/service_registry.ipp:110
#7 0x00005601eaa181eb in boost::asio::detail::service_registry::destroy_services (this=0x5601ebed3420) at /usr/include/boost/asio/detail/impl/service_registry.ipp:54
#8 0x00005601eaa182b9 in boost::asio::execution_context::destroy (this=0x7ffd8d1317c0) at /usr/include/boost/asio/impl/execution_context.ipp:46
#9 0x00005601eaa1824f in boost::asio::execution_context::~execution_context (this=0x7ffd8d1317c0, __in_chrg=<optimized out>) at /usr/include/boost/asio/impl/execution_context.ipp:35
#10 0x00007efebee8108c in boost::asio::io_context::~io_context (this=0x7ffd8d1317c0, __in_chrg=<optimized out>) at /usr/include/boost/asio/impl/io_context.ipp:55
#11 0x00007efebee7d5ee in served::net::server::~server (this=0x7ffd8d1317c0, __in_chrg=<optimized out>) at /.../served/src/served/net/server.cpp:73
#12 0x00005601eaa166b8 in test () at /.../served/src/examples/stability/main.cpp:42
#13 0x00005601eaa16844 in main () at /.../served/src/examples/stability/main.cpp:55
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
#0 0x0000000000000000 in ?? ()
#1 0x00007ffb3149c7f0 in boost::asio::detail::task_io_service_operation::destroy (this=0xdc5960) at /usr/include/boost/asio/detail/task_io_service_operation.hpp:42
#2 0x00007ffb314a4dd4 in boost::asio::detail::op_queue_access::destroy<boost::asio::detail::task_io_service_operation> (o=0xdc5960) at /usr/include/boost/asio/detail/op_queue.hpp:47
#3 0x00007ffb314a2eac in boost::asio::detail::op_queue<boost::asio::detail::task_io_service_operation>::~op_queue (this=0xdc5988, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/op_queue.hpp:81
#4 0x00007ffb314aa588 in boost::asio::detail::task_io_service::~task_io_service (this=0xdc5900, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/task_io_service.hpp:38
#5 0x00007ffb314aa5e4 in boost::asio::detail::task_io_service::~task_io_service (this=0xdc5900, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/task_io_service.hpp:38
#6 0x00007ffb3149c5c4 in boost::asio::detail::service_registry::destroy (service=0xdc5900) at /usr/include/boost/asio/detail/impl/service_registry.ipp:101
#7 0x00007ffb3149c4d8 in boost::asio::detail::service_registry::~service_registry (this=0xdc5260, __in_chrg=<optimized out>) at /usr/include/boost/asio/detail/impl/service_registry.ipp:45
#8 0x00007ffb3149f583 in boost::asio::io_service::~io_service (this=0x7ffd69c1c610, __in_chrg=<optimized out>) at /usr/include/boost/asio/impl/io_service.ipp:53
#9 0x00007ffb31498fe6 in served::net::server::~server (this=0x7ffd69c1c610, __in_chrg=<optimized out>) at /.../served/src/served/net/server.cpp:73
#10 0x00000000004097e2 in test (stop=true) at /.../served/src/examples/stability/main.cpp:42
#11 0x0000000000409934 in main () at /.../served/src/examples/stability/main.cpp:57
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f86b6600700 (LWP 1571)]
0x00007f86ba92ccc9 in boost::asio::detail::epoll_reactor::run (this=0x0, block=true, ops=...) at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:382
382 if (timer_fd_ != -1)
#0 0x00007f86ba92ccc9 in boost::asio::detail::epoll_reactor::run (this=0x0, block=true, ops=...) at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:382
#1 0x00007f86ba92e19d in boost::asio::detail::task_io_service::do_run_one (this=0x9dc8b0, lock=..., this_thread=..., ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:396
#2 0x00007f86ba92dc1b in boost::asio::detail::task_io_service::run (this=0x9dc8b0, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:153
#3 0x00007f86ba92e5cd in boost::asio::io_service::run (this=0x7fff23ccb5a0) at /usr/include/boost/asio/impl/io_service.ipp:59
#4 0x00007f86ba928003 in served::net::server::__lambda0::operator() (__closure=0x9dd370) at /.../served/src/served/net/server.cpp:108
#5 0x00007f86ba92a020 in std::_Bind_simple<served::net::server::run(int, bool)::__lambda0()>::_M_invoke<>(std::_Index_tuple<>) (this=0x9dd370) at /usr/include/c++/4.8.2/functional:1732
#6 0x00007f86ba929f77 in std::_Bind_simple<served::net::server::run(int, bool)::__lambda0()>::operator()(void) (this=0x9dd370) at /usr/include/c++/4.8.2/functional:1720
#7 0x00007f86ba929f10 in std::thread::_Impl<std::_Bind_simple<served::net::server::run(int, bool)::__lambda0()> >::_M_run(void) (this=0x9dd358) at /usr/include/c++/4.8.2/thread:115
#8 0x00007f86ba1a9070 in ?? () from /lib64/libstdc++.so.6
#9 0x00007f86ba402dd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f86b990d02d in clone () from /lib64/libc.so.6
Hey @AdrianDC, this is awesome, thanks for putting it together. I'm not sure I'm going to have spare capacity any time soon to help look into this.
Latest version of Stability: Add stability test example and start multithreaded fixes
:
served: ensure multithreaded servers are properly released
* A class destructor is added to properly release the members
* The threads are joined upon destruction to ensure work is done
* A 1 millisecond delay is added to avoid multithreading races
* Threads are create with "bind" references rather than lambdas
The introduction of the 1 ms delay makes the cleanup safe on the CentOS 7.6 target,
the previously shared unit test performs correctly over 10000 iterations.
The Debian one looks still affected though.
Playing around the allocations of the io_service related members,
and manually releasing the object's memory did not give any difference.
The only situation without failures is by "not" releasing the io_service member,
hence leaking memory over each new running server.