osrf/osrf_testing_tools_cpp

Segfaults in Openssl static initialization with memory interpose library

emersonknapp opened this issue · 1 comments

The memory testing tools interpose linux library seems to be causing segfaults in some static initialization code before main()

When trying out FastRTPS 1.8 (newly cut release branch) with the rest of the ROS2 stack - I am seeing errors in running tests. The errors all seem to manifest as a segmentation fault before main(), in the static initialization of OpenSSL. I've narrowed this down to a smaller reproducible case by building the FastRTPS examples to remove intermediate layers:
Without the interpose library, everything works fine, but with it preloaded, I consistently run into the problems.

> cd src/eProsima/Fast-RTPS
> git checkout release/1.8.0
> mkdir build
> cd build
> cmake .. -DCMAKE_BUILD_TYPE=DEBUG -DTHIRDPARTY=ON -DCOMPILE_EXAMPLES=ON
> make -j$(nproc)
> examples/C++/HelloWorldExample/HelloWorldExample

Starting 
publisher OR subscriber argument needed

> LD_DEBUG=bindings LD_PRELOAD=$(my_ros_workspace)/install/osrf_testing_tools_cpp/lib/libmemory_tools_interpose.so examples/C++/HelloWorldExample/HelloWorldExample

... lots of output ...
       593:     binding file /usr/lib/x86_64-linux-gnu/libssl.so.1.1 [0] to /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 [0]: normal symbol `EVP_sha512' [OPENSSL_1_1_0]
       593:     binding file /usr/lib/x86_64-linux-gnu/libssl.so.1.1 [0] to /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 [0]: normal symbol `COMP_zlib' [OPENSSL_1_1_0]
       593:     binding file /usr/lib/x86_64-linux-gnu/libssl.so.1.1 [0] to /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 [0]: normal symbol `CRYPTO_mem_ctrl' [OPENSSL_1_1_0]
       593:     binding file /usr/lib/x86_64-linux-gnu/libssl.so.1.1 [0] to /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 [0]: normal symbol `OPENSSL_sk_new' [OPENSSL_1_1_0]
       593:     binding file /usr/lib/x86_64-linux-gnu/libssl.so.1.1 [0] to /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 [0]: normal symbol `COMP_get_type' [OPENSSL_1_1_0]
       593:     binding file /usr/lib/x86_64-linux-gnu/libssl.so.1.1 [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `qsort' [GLIBC_2.2.5]
Segmentation fault (core dumped)

Then I try to debugging with the interpose preload set.

> gdb examples/C++/HelloWorldExample/HelloWorldExample 
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from examples/C++/HelloWorldExample/HelloWorldExample...done.
(gdb) run
Starting program: /root/ros/src/eProsima/Fast-RTPS/build/examples/C++/HelloWorldExample/HelloWorldExample 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
__GI___qsort_r (b=b@entry=0x7ffff586ee00 <ssl3_ciphers>, n=n@entry=147, s=s@entry=72, cmp=cmp@entry=0x7ffff562cd60 <cipher_compare>, arg=arg@entry=0x0) at msort.c:249
249     msort.c: No such file or directory.
(gdb) bt
#0  __GI___qsort_r (b=b@entry=0x7ffff586ee00 <ssl3_ciphers>, n=n@entry=147, s=s@entry=72, cmp=cmp@entry=0x7ffff562cd60 <cipher_compare>, arg=arg@entry=0x0) at msort.c:249
#1  0x00007ffff5f406d8 in __GI_qsort (b=b@entry=0x7ffff586ee00 <ssl3_ciphers>, n=n@entry=147, s=s@entry=72, cmp=cmp@entry=0x7ffff562cd60 <cipher_compare>) at msort.c:308
#2  0x00007ffff562cebd in ssl_sort_cipher_list () at ../ssl/s3_lib.c:2735
#3  0x00007ffff5632941 in ssl_load_ciphers () at ../ssl/ssl_ciph.c:381
#4  0x00007ffff56357c2 in ossl_init_ssl_base () at ../ssl/ssl_init.c:99
#5  ossl_init_ssl_base_ossl_ () at ../ssl/ssl_init.c:25
#6  0x00007ffff689f827 in __pthread_once_slow (once_control=0x7ffff58737fc <ssl_base>, init_routine=0x7ffff5635600 <ossl_init_ssl_base_ossl_>) at pthread_once.c:116
#7  0x00007ffff689f8e5 in __GI___pthread_once (once_control=once_control@entry=0x7ffff58737fc <ssl_base>, init_routine=init_routine@entry=0x7ffff5635600 <ossl_init_ssl_base_ossl_>) at pthread_once.c:143
#8  0x00007ffff533b939 in CRYPTO_THREAD_run_once (once=once@entry=0x7ffff58737fc <ssl_base>, init=init@entry=0x7ffff5635600 <ossl_init_ssl_base_ossl_>) at ../crypto/threads_pthread.c:106
#9  0x00007ffff56358eb in OPENSSL_init_ssl (opts=0, settings=<optimized out>) at ../ssl/ssl_init.c:198
#10 0x00007ffff6ed306e in asio::ssl::detail::openssl_init_base::instance() () from /root/ros/install/fastrtps/lib/libfastrtps.so.1
#11 0x00007ffff6dcd5d4 in _GLOBAL__sub_I_TCPChannelResource.cpp () from /root/ros/install/fastrtps/lib/libfastrtps.so.1
#12 0x00007ffff7de5733 in call_init (env=0x7fffffff82b8, argv=0x7fffffff82a8, argc=1, l=<optimized out>) at dl-init.c:72
#13 _dl_init (main_map=0x7ffff7ffe170, argc=1, argv=0x7fffffff82a8, env=0x7fffffff82b8) at dl-init.c:119
#14 0x00007ffff7dd60ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#15 0x0000000000000001 in ?? ()
#16 0x00007fffffff84ef in ?? ()
#17 0x0000000000000000 in ?? ()

In the above, if I put breakpoints in our custom malloc, we are definitely using it. But beyond that, I'm not sure what to do next to figure out why we're getting static initialization segfaults when using the interpose library with FastRTPS1.8. It's causing test failures upstream, I think in anything that calls FastRTPS but i'm not certain that's exactly what it is. Interestingly, in rcl, only the tests that #include "osrf_testing_tools_cpp/memory_tools/memory_tools.hpp" currently fail with a segfault before main(), the others do not. I don't know if that pattern is relevant or a coincidence.

I am investigating this problem and have a potential fix involving alignment of the memory returned by the StaticAllocator