stellar/stellar-core

data race in bucket merge path

marta-lokhova opened this issue · 0 comments

It looks like core is racing on shutdown:

  • in ~ApplicationImpl, we shutdown main IO service before threads are joined
  • As a result, background may still spin up a merge while application is in half-shutdown state (which uses IO context).

Glancing at the code, it looks like background thread uses Application, BucketManager and application's clock (specifically, its IO context to create a new buffered stream, which seems to be racing with main thread IO context shutdown). It's not clear whether this is an issue in the normal merge path as well, so definitely worth investigating also.

tsan output below:

==================
WARNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=39872)
  Write of size 8 at 0x0001183ca818 by main thread:
    #0 stellar::ApplicationImpl::~ApplicationImpl() ApplicationImpl.cpp:669 (stellar-core:arm64+0x1003ba918)
    #1 stellar::ApplicationLoopbackOverlay::~ApplicationLoopbackOverlay() Simulation.h:136 (stellar-core:arm64+0x100c0fa84)
    #2 std::__1::__shared_ptr_emplace<stellar::ApplicationLoopbackOverlay, std::__1::allocator<stellar::ApplicationLoopbackOverlay>>::__on_zero_shared() shared_ptr.h:324 (stellar-core:arm64+0x100c0fa24)
    #3 stellar::Simulation::Node::~Node() Simulation.h:109 (stellar-core:arm64+0x100c0e3cc)
    #4 std::__1::__tree<std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>, std::__1::__map_value_compare<stellar::PublicKey, std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>, std::__1::less<stellar::PublicKey>, true>, std::__1::allocator<std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>>>::destroy(std::__1::__tree_node<std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>, void*>*) __tree:1811 (stellar-core:arm64+0x100c0ea88)
    #5 stellar::Simulation::~Simulation() Simulation.cpp:63 (stellar-core:arm64+0x100c07afc)
    #6 stellar::Simulation::~Simulation() Simulation.cpp:51 (stellar-core:arm64+0x100c07e88)
    #7 std::__1::__shared_ptr_emplace<stellar::Simulation, std::__1::allocator<stellar::Simulation>>::__on_zero_shared() shared_ptr.h:324 (stellar-core:arm64+0x1007ae434)
    #8 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4717 (stellar-core:arm64+0x1007587f0)
    #9 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4594 (stellar-core:arm64+0x1007562cc)
    #10 Catch::RunContext::invokeActiveTestCase() catch.hpp:13025 (stellar-core:arm64+0x100cc0d04)
    #11 Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch.hpp:12998 (stellar-core:arm64+0x100cbd8c8)
    #12 Catch::RunContext::runTest(Catch::TestCase const&) catch.hpp:12759 (stellar-core:arm64+0x100cbc834)
    #13 Catch::Session::runInternal() catch.hpp:13562 (stellar-core:arm64+0x100cc56d8)
    #14 Catch::Session::run() catch.hpp:13518 (stellar-core:arm64+0x100cc481c)
    #15 stellar::runTest(stellar::CommandLineArgs const&) test.cpp:438 (stellar-core:arm64+0x100cef370)
    #16 std::__1::__function::__func<int (*)(stellar::CommandLineArgs const&), std::__1::allocator<int (*)(stellar::CommandLineArgs const&)>, int (stellar::CommandLineArgs const&)>::operator()(stellar::CommandLineArgs const&) function.h:364 (stellar-core:arm64+0x100463fd4)
    #17 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1911 (stellar-core:arm64+0x100434f34)
    #18 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1823 (stellar-core:arm64+0x1004315f4)
    #19 <null> <null> (0x00019644a0e0)

  Previous read of size 8 at 0x0001183ca818 by thread T1177:
    #0 std::__1::__packaged_task_func<stellar::FutureBucket::startMerge(stellar::Application&, unsigned int, bool, unsigned int)::$_2, std::__1::allocator<stellar::FutureBucket::startMerge(stellar::Application&, unsigned int, bool, unsigned int)::$_2>, std::__1::shared_ptr<stellar::Bucket> ()>::operator()() future:1706 (stellar-core:arm64+0x1000d9e00)
    #1 std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>::operator()() future:1969 (stellar-core:arm64+0x1000d7c30)
    #2 std::__1::__function::__func<std::__1::__bind<void (std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>::*)(), std::__1::shared_ptr<std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>>&>, std::__1::allocator<std::__1::__bind<void (std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>::*)(), std::__1::shared_ptr<std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>>&>>, void ()>::operator()() function.h:364 (stellar-core:arm64+0x1000dae3c)
    #3 asio::detail::executor_op<asio::detail::binder0<stellar::ApplicationImpl::postOnBackgroundThread(std::__1::function<void ()>&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>)::$_45>, std::__1::allocator<void>, asio::detail::scheduler_operation>::do_complete(void*, asio::detail::scheduler_operation*, std::__1::error_code const&, unsigned long) executor_op.hpp:70 (stellar-core:arm64+0x1003c4ff4)
    #4 asio::detail::scheduler::do_run_one(asio::detail::conditionally_enabled_mutex::scoped_lock&, asio::detail::scheduler_thread_info&, std::__1::error_code const&) scheduler.ipp:492 (stellar-core:arm64+0x10142e7f0)
    #5 asio::detail::scheduler::run(std::__1::error_code&) scheduler.ipp:209 (stellar-core:arm64+0x10141ddcc)
    #6 asio::io_context::run() io_context.ipp:63 (stellar-core:arm64+0x10141dbf8)
    #7 void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, stellar::ApplicationImpl::ApplicationImpl(stellar::VirtualClock&, stellar::Config const&)::$_49>>(void*) thread.h:238 (stellar-core:arm64+0x1003c6264)

  Location is heap block of size 2032 at 0x0001183ca800 allocated by main thread:
    #0 operator new(unsigned long) <null>:60804740 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x84420)
    #1 std::__1::shared_ptr<stellar::ApplicationLoopbackOverlay> stellar::Application::create<stellar::ApplicationLoopbackOverlay, stellar::Simulation&>(stellar::VirtualClock&, stellar::Config const&, stellar::Simulation&, bool, bool) Application.h:316 (stellar-core:arm64+0x100c0f764)
    #2 stellar::Simulation::addNode(stellar::SecretKey, stellar::SCPQuorumSet, stellar::Config const*, bool, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) Simulation.cpp:140 (stellar-core:arm64+0x100c08580)
    #3 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4611 (stellar-core:arm64+0x10075665c)
    #4 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4594 (stellar-core:arm64+0x1007562cc)
    #5 Catch::RunContext::invokeActiveTestCase() catch.hpp:13025 (stellar-core:arm64+0x100cc0d04)
    #6 Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch.hpp:12998 (stellar-core:arm64+0x100cbd8c8)
    #7 Catch::RunContext::runTest(Catch::TestCase const&) catch.hpp:12759 (stellar-core:arm64+0x100cbc834)
    #8 Catch::Session::runInternal() catch.hpp:13562 (stellar-core:arm64+0x100cc56d8)
    #9 Catch::Session::run() catch.hpp:13518 (stellar-core:arm64+0x100cc481c)
    #10 stellar::runTest(stellar::CommandLineArgs const&) test.cpp:438 (stellar-core:arm64+0x100cef370)
    #11 std::__1::__function::__func<int (*)(stellar::CommandLineArgs const&), std::__1::allocator<int (*)(stellar::CommandLineArgs const&)>, int (stellar::CommandLineArgs const&)>::operator()(stellar::CommandLineArgs const&) function.h:364 (stellar-core:arm64+0x100463fd4)
    #12 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1911 (stellar-core:arm64+0x100434f34)
    #13 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1823 (stellar-core:arm64+0x1004315f4)
    #14 <null> <null> (0x00019644a0e0)

  Thread T1177 (tid=24080104, running) created by main thread at:
    #0 pthread_create <null>:60804740 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3062c)
    #1 stellar::ApplicationImpl::ApplicationImpl(stellar::VirtualClock&, stellar::Config const&) ApplicationImpl.cpp:178 (stellar-core:arm64+0x1003b3e50)
    #2 stellar::TestApplication::TestApplication(stellar::VirtualClock&, stellar::Config const&) TestUtils.cpp:156 (stellar-core:arm64+0x100c74260)
    #3 std::__1::shared_ptr<stellar::ApplicationLoopbackOverlay> stellar::Application::create<stellar::ApplicationLoopbackOverlay, stellar::Simulation&>(stellar::VirtualClock&, stellar::Config const&, stellar::Simulation&, bool, bool) Application.h:316 (stellar-core:arm64+0x100c0f7e4)
    #4 stellar::Simulation::addNode(stellar::SecretKey, stellar::SCPQuorumSet, stellar::Config const*, bool, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) Simulation.cpp:140 (stellar-core:arm64+0x100c08580)
    #5 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4611 (stellar-core:arm64+0x10075665c)
    #6 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4594 (stellar-core:arm64+0x1007562cc)
    #7 Catch::RunContext::invokeActiveTestCase() catch.hpp:13025 (stellar-core:arm64+0x100cc0d04)
    #8 Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch.hpp:12998 (stellar-core:arm64+0x100cbd8c8)
    #9 Catch::RunContext::runTest(Catch::TestCase const&) catch.hpp:12759 (stellar-core:arm64+0x100cbc834)
    #10 Catch::Session::runInternal() catch.hpp:13562 (stellar-core:arm64+0x100cc56d8)
    #11 Catch::Session::run() catch.hpp:13518 (stellar-core:arm64+0x100cc481c)
    #12 stellar::runTest(stellar::CommandLineArgs const&) test.cpp:438 (stellar-core:arm64+0x100cef370)
    #13 std::__1::__function::__func<int (*)(stellar::CommandLineArgs const&), std::__1::allocator<int (*)(stellar::CommandLineArgs const&)>, int (stellar::CommandLineArgs const&)>::operator()(stellar::CommandLineArgs const&) function.h:364 (stellar-core:arm64+0x100463fd4)
    #14 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1911 (stellar-core:arm64+0x100434f34)
    #15 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1823 (stellar-core:arm64+0x1004315f4)
    #16 <null> <null> (0x00019644a0e0)

SUMMARY: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) ApplicationImpl.cpp:669 in stellar::ApplicationImpl::~ApplicationImpl()
==================