libbitcoin/libbitcoin-blockchain

[version3] Deadlock on node closing

Opened this issue · 51 comments

(Posted here by Eric indication).

Hello.
I am experimenting a deadlock when I try to close the node (bn) Ctrl-C (SIGINT, SIGTERM).

To get the bn executable I follow:

wget https://raw.githubusercontent.com/libbitcoin/libbitcoin-node/version3/install.sh
chmod +x install.sh
sudo ./install.sh --prefix=/opt/libbitcoin --build-boost --disable-shared
#Boost modified to use 1.64 because 1.57 has compilation errors (multiprecision)

Using the following configuration file:

# cat /home/ubuntu/fer/deadlock/data/deadlock-btc.cfg 

[log]
archive_directory = /home/ubuntu/fer/deadlock/data/log/archive-node-btc-testnet
debug_file = /home/ubuntu/fer/deadlock/data/log/node-btc-testnet-debug.log
error_file = /home/ubuntu/fer/deadlock/data/log/node-btc-testnet-error.log
rotation_size = 100000000
minimum_free_space = 0
verbose = true

[network]
protocol_maximum = 70013
protocol_minimum = 31402
identifier = 118034699
outbound_connections = 50
inbound_connections = 50
inbound_port = 18333
channel_expiration_minutes = 30
hosts_file = /home/ubuntu/fer/deadlock/data/hosts-btc-testnet.cache
seed = testnet-seed.bitcoin.jonasschnelli.ch:18333
seed = seed.tbtc.petertodd.org:18333
seed = testnet-seed.bluematt.me:18333
seed = testnet-seed.bitcoin.schildbach.de:18333
seed = testnet-seed.voskuil.org:18333
self = 0.0.0.0:0

[database]
directory = /home/ubuntu/fer/deadlock/data/database/btc-testnet
flush_writes = false

[blockchain]

# cores = 4

# Testnet-btc
checkpoint = 000000000933ea01ad0ee984209779baaec3ced90fa3f408719526f8d77f4943:0
checkpoint = 00000000009e2958c15ff9290d571bf9459e93b19765c6801ddeccadbb160a1e:100000
checkpoint = 0000000000287bffd321963ef05feab753ebe274e1d78b2fd4e2bfe9ad3aa6f2:200000
checkpoint = 000000000000226f7618566e70a2b5e020e29579b46743f05348427239bf41a1:300000
checkpoint = 000000000598cbbb1e79057b79eef828c495d4fc31050e6b179c57d07d00367c:400000
checkpoint = 000000000001a7c0aaa2630fbb2c0e476aafffc60f82177375b2aaa22209f606:500000
checkpoint = 000000000000624f06c69d3a9fe8d25e0a9030569128d63ad1b704bbb3059a16:600000
checkpoint = 000000000000406178b12a4dea3b27e13b3c4fe4510994fd667d7c1e6a3f4dc1:700000
checkpoint = 0000000000209b091d6519187be7c2ee205293f25f9f503f90027e25abf8b503:800000
checkpoint = 0000000000356f8d8924556e765b7a94aaebc6b5c8685dcfa2b1ee8b41acd89b:900000
checkpoint = 0000000000478e259a3eda2fafbeeb0106626f946347955e99278fe6cc848414:1000000
checkpoint = 00000000001aa0b431dc7f8fa75179b8440bdb671db5ca79e1087faff00c19d8:1050000
checkpoint = 00000000001c2fb9880485b1f3d7b0ffa9fabdfd0cf16e29b122bb6275c73db0:1100000

[fork]
easy_blocks = true
# retarget = true
bip16 = true
bip30 = true
bip34 = true
bip66 = true
bip65 = true
bip90 = true

[node]
relay_transactions=true

... and I have the following Python script to run the node multiple times:

# cat multiple_launchs.py 

from subprocess import call
from subprocess import check_output
from random import randint
from subprocess import Popen
from subprocess import CalledProcessError

import subprocess
import signal
import os
import time

def get_pid(name):
    try:
        return map(int, check_output(["pidof",name]).split())
    except CalledProcessError:
        return []

def process_is_running(name):
    return len(get_pid(name)) > 0

def launch():
    sout = open('sout-fer.log', 'w')
    serr = open('serr-fer.log', 'w')
    return (sout, serr, Popen(["./bn", "-c", "/home/ubuntu/fer/deadlock/data/deadlock-btc.cfg"], stdout=sout, stderr=serr))

def finalize(p):
    pid = p.pid
    os.kill(pid, signal.SIGINT)

def finalize_and_wait(p):
    finalize(p)
    return p.wait()


def finalize_wait_measure(p):
    finalize(p)
    start_time = time.time()
    p.wait()
    elapsed_time = time.time() - start_time
    return elapsed_time


def wait_for_termination(name):
    while process_is_running(name):
        time.sleep(10)

def main():

    stats = []

    for i in range(0, 100):
        sout, serr, p = launch()
        seconds = randint(10, 120)
        print("******************************* Sleeping for %s seconds..." % (seconds))
        time.sleep(seconds)
        print("******************************* SLEEP FINALIZED *******************************************")


        elapsed_time = finalize_wait_measure(p)
        sout.close()
        serr.close()
        print("******************************* Closing procedure elapsed time: %s seconds..." % (elapsed_time))

        stats.append([i, seconds, elapsed_time])

    print("******************************* STATS *******************************************")
    print(stats)

main()

Node chain initialization and script running:

./bn -i -c /home/ubuntu/fer/deadlock/data/deadlock-btc.cfg
python multiple_launchs.py

(the python script have to be in the same directory as bn executable)

Thanks and regards,
Fernando.

#Boost modified to use 1.64 because 1.57 has compilation errors (multiprecision)

I don't get this, our CI builds use 1.57.

Using a gcc version later than 5.x has boost warnings that shake out as errors (known boost bug corrected in later versions). A solution is to specify gcc-5 such as:

CC=gcc-5 CXX=g++-5 ./install.sh [args]

UPDATE: https://svn.boost.org/trac10/ticket/12181

Seems like a good reason to bump our boost minimum.

The root cause appears to be that priority_dispatch_.concurrent(...) must succeed whether or not stop has occurred, but the priority thread pool is suspended upon stop, precluding the dispatch of any new work. However this is guarded by blocking stop for the validation mutex.

The problem is that the guard implementation is flawed. The organize(...) methods (tx and block) must check stopped() before invocation of the validator but do not.

@fpelliccioni Thanks for troubleshooting and posting this! Please re-verify once this is merged.

On a related note, is there a max version of boost that is acceptable for upgrade? Specifically, what restrictions are there keeping us from the latest version (1.67.0 as of now) assuming proper build and smoke testing for now?

There’s no strong reason for any particular version. The rationale has been that a lower version reduces the need to install. But the reality of boost install variation means this isn’t so important. As long as there is a NuGet package and all builds are good I don’t see a problem with a higher version.

Boost has been upgraded in master #420

Status?

ok, hvae fun!

Status?

The deadlock is still happening.
I have the process (bn) blocked for more than 20 hours.
To test it, I executed the same steps described above, in an Ubuntu 16.04 with GCC 7.3.0.
In successful shutdowns, the closing time is approximately 130 seconds.

@garceri (Gerardo) told me that he also experiences these deadlocks every day when he works on libbitcoin community servers.

Odd that it is apparently common for you but that @thecodefactory has not seen it.

@garceri (Gerardo) told me that he also experiences these deadlocks every day when he works on libbitcoin community servers.

What does this mean?

@garceri (Gerardo) told me that he also experiences these deadlocks every day when he works on libbitcoin community servers.

What does this mean?

Gerardo is our devops, he regularly configures environments and he has reported that he has experienced the deadlock many times.

Odd that it is apparently common for you but that @thecodefactory has not seen it.

It is easy to reproduce with the instructions I gave.
@thecodefactory could you provide how did you test it? Thanks!

Gerardo is our devops, he regularly configures environments and he has reported that he has experienced the deadlock many times.

Ok, thanks. To eliminate the possibility that it's script related (to ensure python I/O deadlocks are not introduced), you have verified that this is easily reproducible by manually starting/stopping a version3 server?

Gerardo is our devops, he regularly configures environments and he has reported that he has experienced the deadlock many times.

Ok, thanks. To eliminate the possibility that it's script related (to ensure python I/O deadlocks are not introduced), you have verified that this is easily reproducible by manually starting/stopping a version3 server?

Gerardo has verified it manually. (I have verified it using the script).
The problem is that doing it manually is tedious, because it is probably that the first, second or third execution are correct, sometimes it occurs in execution number 50 (for example).
Check the script to see if it finds any flaw, for me it is correct.

How is it that he is working so frequently with libbitcoin server vs. bitprim server?

I have been doing some tests with libbitcoin recently and found that if you increase the number of connections (both incoming and outgoing) that seems to start causing deadlocks.

I’ve run thousands of connections and have never seen it. If I can’t trap it in a debugger it’s unlikely to be found.

I have been doing some tests with libbitcoin recently and found that if you increase the number of connections (both incoming and outgoing) that seems to start causing deadlocks.

Which values are you using? I'll try the same.

I’ve run thousands of connections and have never seen it. If I can’t trap it in a debugger it’s unlikely to be found.

You can use my script and then attach to the process.

I’ve run thousands of connections and have never seen it. If I can’t trap it in a debugger it’s unlikely to be found.

You can use my script and then attach to the process.

I could do it, but not today...

I could do it, but not today...

The backtrace can help, but if run from the script, it's not guaranteed to be a deadlock in the server. Once attached, it's stopped and inspecting the state might not tell much.

I'm not saying it can't happen, but I'm trying to find a sane way to verify this. I've been mind-numbingly starting/stopping a latest v3 bn server manually over and over while looking at other things. No shutdown errors so far.

I could do it, but not today...

The backtrace can help, but if run from the script, it's not guaranteed to be a deadlock in the server. Once attached, it's stopped and inspecting the state might not tell much.

I'm not saying it can't happen, but I'm trying to find a sane way to verify this. I've been mind-numbingly starting/stopping a latest v3 bn server manually over and over while looking at other things. No shutdown errors so far.

It is possible that running in Debug mode the error does not manifest.

he backtrace can help, but if run from the script, it's not guaranteed to be a deadlock in the server. Once attached, it's stopped and inspecting the state might not t

Do you work only in Windows?

Do you work only in Windows?

I use GNU/Linux.

It’s unlikely to differ in debug vs. release builds if a debugger is not attached. But I run both and have never seen it.

@fpelliccioni @garceri If you have access to GNU/Linux, see if you can reproduce it with this script (which is hacky, but more or less the same idea as yours, but eliminates the possibility of python pipe I/O deadlocks):

#!/bin/bash

bn_program=bin/bn
bn_config=bn.cfg.v3

while [ true ]; do
    $bn_program -c $bn_config &
    pid=$(echo $!)
    sleep 20
    kill $pid
    kill_time=$(date +"%s")
    while [ $(ps ux | grep $pid | wc -l) -eq 2 ]; do
        now_time=$(date +"%s")
        elapsed=$(($now_time - $kill_time))                           
        echo "Waiting for $pid to exit ($elapsed seconds elapsed) ..."
        sleep 5
    done
    sleep 2
done

I've had this running the past ~30 minutes or so.

@fpelliccioni @garceri If you have access to GNU/Linux, see if you can reproduce it with this script (which is hacky, but more or less the same idea as yours, but eliminates the possibility of python pipe I/O deadlocks):

#!/bin/bash

bn_program=bin/bn
bn_config=bn.cfg.v3

while [ true ]; do
    $bn_program -c $bn_config &
    pid=$(echo $!)
    sleep 20
    kill $pid
    kill_time=$(date +"%s")
    while [ $(ps ux | grep $pid | wc -l) -eq 2 ]; do
        now_time=$(date +"%s")
        elapsed=$(($now_time - $kill_time))                           
        echo "Waiting for $pid to exit ($elapsed seconds elapsed) ..."
        sleep 5
    done
    sleep 2
done

I've had this running the past ~30 minutes or so.

I am running it.
I will let you know, it could take hours.

Deadlock using the bash script.

...
Waiting for 28069 to exit (2987 seconds elapsed) ...                                                                                                                                                                                          
Waiting for 28069 to exit (2992 seconds elapsed) ... 
# ps -ax | grep bn
28069 pts/5    Sl+    0:00 /home/ubuntu/test_libbitcoin/build-libbitcoin-node/libbitcoin-node/console/bn -c /home/ubuntu/test_libbitcoin/data/deadlock-btc.cfg

Deadlock using the bash script.

Interesting, leave it running to see if it resolves, but that's definitely a longer time for exit than I've seen by far. I've been running mine since I said earlier (~6+ hours ago), still without any issue.

If it runs longer and you're able to attach to it, the backtraces could help.

Waiting for 28069 to exit (46416 seconds elapsed) ...
Waiting for 28069 to exit (46421 seconds elapsed) ...
Waiting for 28069 to exit (46426 seconds elapsed) ...

Can we say that we are a deadlock?

It appears we are reproducing what we have seen anecdotally. It happens on bitprim builds/machines but not others.

It appears we are reproducing what we have seen anecdotally. It happens on bitprim builds/machines but not others.

Ok, but is a normal machine with Ubuntu Linux, and the build is done using https://raw.githubusercontent.com/libbitcoin/libbitcoin-node/version3/install.sh

I will talk with @jujumax and @garceri to bring you access to one of those machines.

Can we say that we are a deadlock?

Send a kill signal to it and wait a bit more if it's still running, to be sure the script isn't causing some error.

The fact that it's ubuntu does concern me. There was a reported bug (on slack) that causes hangs if you don't build boost statically, specifically on ubuntu. I was able to reproduce it on a VM, but haven't followed up on that yet. I also haven't re-tested since we've upgraded boost versions.

Can you confirm that you've built your version statically?

EDIT: And yes, the hangs are on shutdown (reported as when ctrl-c was pressed).

The fact that it's ubuntu does concern me. There was a reported bug (on slack) that causes hangs if you don't build boost statically, specifically on ubuntu. I was able to reproduce it on a VM, but haven't followed up on that yet. I also haven't re-tested since we've upgraded boost versions.

Can you confirm that you've built your version statically?

EDIT: And yes, the hangs are on shutdown (reported as when ctrl-c was pressed).

I ran the install.sh script in the way described above, so according to it, I am building Boost statically.

--------------------------------------------------------------------
OS                    : Linux
PARALLEL              : 48
CC                    : 
CXX                   : 
CPPFLAGS              : 
CFLAGS                : 
CXXFLAGS              : 
LDFLAGS               : 
LDLIBS                : 
BUILD_BOOST           : yes
BUILD_DIR             : build-libbitcoin-node
PREFIX                : /opt/libbitcoin
DISABLE_SHARED        : yes
DISABLE_STATIC        : 
with_boost            : --with-boost=/opt/libbitcoin
with_pkgconfigdir     : --with-pkgconfigdir=/opt/libbitcoin/lib/pkgconfig
--------------------------------------------------------------------

I will try using DISABLE_STATIC and I will let you know.

@thecodefactory do you have a repro?

@thecodefactory do you have a repro?

No, I don't (at least not yet, using the script). I did stop running it after a couple days, but could try again a little bit more. It might be something like environment related, so it might be better to try it on my ubuntu VM to see if it's more likely there.

No luck reproducing so far, running all day on my Ubuntu 16.04.4 LTS VM.

$ uname -a
Linux ubuntu 4.4.0-128-generic #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It's possible something is manifesting from gcc version differences?

Several days into running the script on the Ubuntu VM, I got a hang on shutdown. @fpelliccioni Were you ever able to attach so that we can compare notes here?

(gdb) where
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000046de13 in boost::condition_variable::wait (this=0x1256890, m=...)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  0x00000000004e5d69 in boost::shared_mutex::lock (this=0x1256808)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:294
#3  0x000000000063f21d in libbitcoin::prioritized_mutex::lock_high_priority (this=0x12568bc, 
    this@entry=0x1256800) at src/utility/prioritized_mutex.cpp:55
#4  0x00000000004b3180 in libbitcoin::blockchain::block_chain::stop (this=this@entry=0x1255fa0)
    at src/interface/block_chain.cpp:353
#5  0x0000000000450d49 in libbitcoin::node::full_node::stop (this=this@entry=0x1255280)
    at src/full_node.cpp:254
#6  0x0000000000451266 in libbitcoin::node::full_node::close (this=0x1255280) at src/full_node.cpp:271
#7  0x000000000044123b in libbitcoin::node::executor::run (this=this@entry=0x7fffe3eb5c50)
    at console/executor.cpp:213
#8  0x0000000000441588 in libbitcoin::node::executor::menu (this=this@entry=0x7fffe3eb5c50)
    at console/executor.cpp:179
#9  0x000000000040f602 in libbitcoin::main (argc=3, argv=0x7fffe3eb5fd8) at console/main.cpp:45
#10 0x00007fb82feec830 in __libc_start_main (main=0x40f690 <main(int, char**)>, argc=3, 
    argv=0x7fffe3eb5fd8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffe3eb5fc8) at ../csu/libc-start.c:291
#11 0x000000000043d969 in _start ()

THREAD 2

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000046de13 in boost::condition_variable::wait (this=0x1259b48, m=...)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  0x000000000048179d in boost::shared_mutex::unlock_upgrade_and_lock (this=this@entry=0x1259a68)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:503
#3  0x00000000004ff275 in libbitcoin::database::memory_map::reserve (this=0x1259a08, size=4796908, 
    expansion=<optimized out>) at src/memory/memory_map.cpp:360
#4  0x00000000004ff521 in libbitcoin::database::memory_map::reserve (this=<optimized out>, 
    size=<optimized out>) at src/memory/memory_map.cpp:332
#5  0x0000000000500a8b in libbitcoin::database::record_manager::new_records (
    this=this@entry=0x1259ba0, count=count@entry=1) at src/primitives/record_manager.cpp:133
#6  0x00000000004eaf1e in libbitcoin::database::block_database::write_position (
    this=this@entry=0x1259350, position=position@entry=290581446, height=599612)
    at src/databases/block_database.cpp:270
#7  0x00000000004ebbe2 in libbitcoin::database::block_database::store (this=0x1259350, block=..., 
    height=height@entry=599612) at src/databases/block_database.cpp:215
#8  0x00000000004e04e4 in libbitcoin::database::data_base::handle_push_transactions(std::error_code const&, std::shared_ptr<libbitcoin::message::block const>, unsigned long, std::function<void (std::error_code const&)>) (this=0x12563b8, ec=..., block=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0, 
    height=599612, handler=...) at src/data_base.cpp:738
... more omitted ...

THREAD 3

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000006480a9 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (ec=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  boost::asio::io_service::run (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  libbitcoin::threadpool::<lambda()>::operator() (__closure=<optimized out>, 
    __closure=<optimized out>) at src/utility/threadpool.cpp:89
#6  boost::detail::thread_data<libbitcoin::threadpool::spawn_once(libbitcoin::thread_priority)::<lambda()> >::run(void) (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/detail/thread.hpp:116
#7  0x0000000000700095 in thread_proxy ()
#8  0x00007fb830d466ba in start_thread (arg=0x7fb82f6ca700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


THREAD 4

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000006480a9 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (ec=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  boost::asio::io_service::run (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  libbitcoin::threadpool::<lambda()>::operator() (__closure=<optimized out>, 
    __closure=<optimized out>) at src/utility/threadpool.cpp:89
#6  boost::detail::thread_data<libbitcoin::threadpool::spawn_once(libbitcoin::thread_priority)::<lambda()> >::run(void) (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/detail/thread.hpp:116
#7  0x0000000000700095 in thread_proxy ()
#8  0x00007fb830d466ba in start_thread (arg=0x7fb82f6ca700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

THREAD 5

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000006480a9 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (ec=..., this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  boost::asio::io_service::run (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  libbitcoin::threadpool::<lambda()>::operator() (__closure=<optimized out>, 
    __closure=<optimized out>) at src/utility/threadpool.cpp:89
#6  boost::detail::thread_data<libbitcoin::threadpool::spawn_once(libbitcoin::thread_priority)::<lambda()> >::run(void) (this=<optimized out>)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/detail/thread.hpp:116
#7  0x0000000000700095 in thread_proxy ()
#8  0x00007fb830d466ba in start_thread (arg=0x7fb82e6c8700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

THREAD 6

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000047a533 in boost::condition_variable::wait (m=..., this=0x1259d28)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  boost::shared_mutex::lock_shared (this=0x1259cf8)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:191
#3  boost::shared_lock<boost::shared_mutex>::lock (this=0x7fb72e0695e0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:645
#4  0x00000000004ea4a5 in boost::shared_lock<boost::shared_mutex>::shared_lock (m_=..., 
    this=0x7fb72e0695e0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:520
#5  libbitcoin::database::block_database::read_position (this=this@entry=0x1259350, 
    height=height@entry=372306) at src/databases/block_database.cpp:289
#6  0x00000000004ea632 in libbitcoin::database::block_database::get (this=0x1259350, 
    height=height@entry=372306) at src/databases/block_database.cpp:150
#7  0x00000000004b80a3 in libbitcoin::blockchain::block_chain::fetch_locator_block_headers(std::shared_ptr<libbitcoin::message::get_headers const>, std::array<unsigned char, 32ul> const&, unsigned long, std::function<void (std::error_code const&, std::shared_ptr<libbitcoin::message::headers>)>) const (
    this=0x1255fa0, locator=..., threshold=..., limit=<optimized out>, handler=...)
    at src/interface/block_chain.cpp:835
#8  0x000000000049599d in libbitcoin::node::protocol_block_out::handle_receive_get_headers (
    this=0x7fb7280085b0, ec=..., message=std::shared_ptr (count 4, weak 0) 0x7fb7101c9ae0)
    at src/protocols/protocol_block_out.cpp:159
... more omitted ...


THREAD 7

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fb83086ba16 in std::__atomic_futex_unsigned_base::_M_futex_wait_until(unsigned int*, unsigned int, bool, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00000000004bba00 in std::__atomic_futex_unsigned<2147483648u>::_M_load_and_test_until (__ns=..., 
    __s=..., __has_timeout=<optimized out>, __mo=<optimized out>, __equal=<optimized out>, 
    __operand=<optimized out>, __assumed=<optimized out>, this=<optimized out>)
    at /usr/include/c++/5/bits/atomic_futex.h:104
#3  std::__atomic_futex_unsigned<2147483648u>::_M_load_and_test (__mo=<optimized out>, 
    __equal=<optimized out>, __operand=<optimized out>, __assumed=<optimized out>, 
    this=<optimized out>) at /usr/include/c++/5/bits/atomic_futex.h:122
#4  std::__atomic_futex_unsigned<2147483648u>::_M_load_when_equal (__mo=std::memory_order_acquire, 
    __val=1, this=0x7fb7100009a0) at /usr/include/c++/5/bits/atomic_futex.h:162
#5  std::__future_base::_State_baseV2::wait (this=0x7fb710000990) at /usr/include/c++/5/future:322
#6  std::__basic_future<std::error_code>::_M_get_result (this=<optimized out>)
    at /usr/include/c++/5/future:681
#7  std::future<std::error_code>::get (this=<optimized out>) at /usr/include/c++/5/future:760
#8  libbitcoin::blockchain::block_organizer::organize(std::shared_ptr<libbitcoin::message::block const>, std::function<void (std::error_code const&)>) (this=this@entry=0x1256f00, 
    block=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0, handler=...)
    at src/pools/block_organizer.cpp:123
#9  0x00000000004b45f2 in libbitcoin::blockchain::block_chain::organize(std::shared_ptr<libbitcoin::message::block const>, std::function<void (std::error_code const&)>) (this=0x1255fa0, 
    block=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0, handler=...)
    at src/interface/block_chain.cpp:1031
#10 0x0000000000488b4f in libbitcoin::node::protocol_block_in::handle_receive_block (
    this=0x7fb728007600, ec=..., message=std::shared_ptr (count 18, weak 0) 0x7fb7101c77e0)
    at src/protocols/protocol_block_in.cpp:343
... more omitted ...

THREAD 8

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000047aba3 in boost::condition_variable::wait (m=..., this=0x7fb7101b9628)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  boost::shared_mutex::lock (this=0x7fb7101b95a0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:294
#3  boost::unique_lock<boost::shared_mutex>::lock (this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:346
#4  0x000000000059fa05 in boost::unique_lock<boost::shared_mutex>::unique_lock (m_=..., 
    this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:124
#5  libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::do_invoke (this=0x7fb7101b9550, args#0=..., args#1=std::shared_ptr (empty) 0x0)
    at /scrubbed/path/libbitcoin/version3/install/include/bitcoin/bitcoin/impl/utility/resubscriber.ipp:133
#6  0x00000000005956ed in std::_Mem_fn_base<void (libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::*)(std::error_code, std::shared_ptr<libbitcoin::message::block const>), true>::_M_call<std::shared_ptr<libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> > >&, std::error_code&, std::shared_ptr<libbitcoin::message::block const>&>
    (__ptr=std::shared_ptr (count 2, weak 1) 0x7fb7101b9550, this=0x7fb72d067ce0)
    at /usr/include/c++/5/functional:634
... more omitted ...

THREAD 9

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000000000047aba3 in boost::condition_variable::wait (m=..., this=0x7fb7101b9628)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/condition_variable.hpp:77
#2  boost::shared_mutex::lock (this=0x7fb7101b95a0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/pthread/shared_mutex.hpp:294
#3  boost::unique_lock<boost::shared_mutex>::lock (this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:346
#4  0x000000000059fa05 in boost::unique_lock<boost::shared_mutex>::unique_lock (m_=..., 
    this=0x7fb72d067c40)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/thread/lock_types.hpp:124
#5  libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::do_invoke (this=0x7fb7101b9550, args#0=..., args#1=std::shared_ptr (empty) 0x0)
    at /scrubbed/path/libbitcoin/version3/install/include/bitcoin/bitcoin/impl/utility/resubscriber.ipp:133
#6  0x00000000005956ed in std::_Mem_fn_base<void (libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> >::*)(std::error_code, std::shared_ptr<libbitcoin::message::block const>), true>::_M_call<std::shared_ptr<libbitcoin::resubscriber<std::error_code, std::shared_ptr<libbitcoin::message::block const> > >&, std::error_code&, std::shared_ptr<libbitcoin::message::block const>&>
    (__ptr=std::shared_ptr (count 2, weak 1) 0x7fb7101b9550, this=0x7fb72d067ce0)
    at /usr/include/c++/5/functional:634


THREAD 10

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000586338 in boost::asio::detail::posix_event::wait<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > (lock=..., this=0x7fb7280012c8)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_event.hpp:106
#2  boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., 
    this=0x7fb728001270)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:380
#3  boost::asio::detail::task_io_service::run (this=0x7fb728001270, ec=...)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/task_io_service.ipp:149
#4  0x0000000000587519 in boost::asio::io_service::run (this=0x7fb728002070)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/impl/io_service.ipp:59
#5  boost::asio::detail::resolver_service_base::work_io_service_runner::operator() (
    this=0x7fb7280027c8)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/resolver_service_base.ipp:32
#6  boost::asio::detail::posix_thread::func<boost::asio::detail::resolver_service_base::work_io_service_runner>::run (this=0x7fb7280027c0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/posix_thread.hpp:82
#7  boost::asio::detail::boost_asio_detail_posix_thread_function (arg=0x7fb7280027c0)
    at /scrubbed/path/libbitcoin/version3/install/include/boost/asio/detail/impl/posix_thread.ipp:64
#8  0x00007fb830d466ba in start_thread (arg=0x7fb727fff700) at pthread_create.c:333
#9  0x00007fb82ffd341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

I still have not ever seen this. Note also: libbitcoin/libbitcoin-server#360

@thecodefactory How many peers were configured for the above trace?

@thecodefactory How many peers were configured for the above trace?

Good question, it's been a while so I don't recall. I had archived the VM it was running on, but will try to restore it to find out.

@thecodefactory How many peers were configured for the above trace?

Ok, it was a v3 run using the above shell script with a testnet configuration. 0 inbound, 2 outbound, 1 manual peer. I'll update and rebuild the v3 code and see if I can get this test re-started with latest.

Great, 3 was the right answer.