LLNL/Caliper

Possible static initialization fiasco situation

jrmadsen opened this issue · 8 comments

Hey David, I believe I have encountered a static initialization fiasco when calling cali_init() with tag 2.3.0. I can disable how this issue cropped up and just lazily init caliper so after reading this, feel free to close the issue, I just wanted to bring it to your attention and maybe discuss experiences so that we can find the best way to handle static initialization (since I have dealt with this issue a lot, github is noting this issue is "similar to 4 other issues", and we both, obviously, would like to find the most stable method for it).

Here is the back-trace:

cali::RuntimeConfig::RuntimeConfigImpl::init_configset (this=0x0, name=0xf2799f "memory", list=0x133ae70 <cali::MemoryPool::MemoryPoolImpl::s_configdata>) at ../external/caliper/src/common/RuntimeConfig.cpp:280
280	        if (m_database.empty())
(gdb) bt
#0  cali::RuntimeConfig::RuntimeConfigImpl::init_configset (this=0x0, name=0xf2799f "memory", list=0x133ae70 <cali::MemoryPool::MemoryPoolImpl::s_configdata>) at ../external/caliper/src/common/RuntimeConfig.cpp:280
#1  0x0000000000e6f817 in cali::RuntimeConfig::init (this=<optimized out>, name=0xf2799f "memory", list=0x133ae70 <cali::MemoryPool::MemoryPoolImpl::s_configdata>) at ../external/caliper/src/common/RuntimeConfig.cpp:358
#2  0x0000000000ec24ff in cali::MemoryPool::MemoryPoolImpl::MemoryPoolImpl (this=0x31ed700) at ../external/caliper/src/caliper/MemoryPool.cpp:92
#3  0x0000000000ec21fc in cali::MemoryPool::MemoryPool (this=0x31de200) at ../external/caliper/src/caliper/MemoryPool.cpp:129
#4  0x0000000000eb00bd in cali::MetadataTree::MetadataTreeImpl::MetadataTreeImpl (this=0x31de200) at ../external/caliper/src/caliper/MetadataTree.cpp:131
#5  0x0000000000eaf86c in cali::MetadataTree::MetadataTree (this=0x31e37a0) at ../external/caliper/src/caliper/MetadataTree.cpp:578
#6  0x0000000000e9a027 in cali::Caliper::ThreadData::ThreadData (this=0x31e37a0, initial_thread=255) at ../external/caliper/src/caliper/Caliper.cpp:298
#7  cali::Caliper::instance () at ../external/caliper/src/caliper/Caliper.cpp:1796
#8  0x0000000000e6b66e in cali_init () at ../external/caliper/src/caliper/cali.cpp:777
#9  0x0000000000c0e6c2 in tim::backend::cali::init () at ../source/timemory/components/caliper/backends.hpp:129
#10 tim::component::caliper::global_init () at ../source/timemory/components/caliper/components.hpp:57
#11 tim::impl::storage<tim::component::caliper, false>::initialize (this=0x31e36e0) at ../source/timemory/storage/definition.hpp:996
#12 tim::storage_initializer::get<tim::component::caliper>()::{lambda()#1}::operator()() const (this=<optimized out>) at ../source/timemory/operations/definition.hpp:93
#13 tim::storage_initializer::get<tim::component::caliper> () at ../source/timemory/operations/definition.hpp:90
#14 0x0000000000f02d9d in __libc_csu_init ()
#15 0x00007f4644fc5b28 in __libc_start_main (main=0x831b50 <main(int, char**)>, argc=1, argv=0x7ffe3b472948, init=0xf02d50 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe3b472938)
    at ../csu/libc-start.c:266
#16 0x0000000000831a4a in _start () at ../source/timemory/units.hpp:103

It is not immediately clear to me how the mP instance for init_configset is null (how it gets initialized looks fine to me) but I suspect that it may be related to this style of static initialization:

Caliper::GlobalData::S_GObject Caliper::GlobalData::gObj;

mostly because this style has caused me tons of problems in the past -- in fact, this situation only presented itself when I introduced this to the extern template builds of timemory which call cali_init():

namespace {
tim::storage_initializer storage_initializer_caliper = 
    tim::storage_initializer::get<tim::component::caliper>();
}

This might be a red-herring though because it appears those instances exist but we both know how weird static initialization of variables gets.

The method I have adopted essentially amounts to converting all of these statically initialized "variables" to function calls which return a reference to a static variable inside the function. Since those variables are scoped, calling the function guarantees that they will exist when the function returns, e.g. converting:

class foo
{
public:
    static foo* instance() { return f_instance; }
private:
    static foo* f_instance;
};
static foo* foo::instance = new foo;

to:

class foo
{
public:
    static foo* instance() { return f_instance(); }
private:
    static foo*& f_instance() {
        static foo* _instance = new foo; 
        return _instance;
    }
};

I have found that with this approach, there is no static initialization issue if instance() is called before the variable f_instance is created, thus allowing any object getting statically initialized to safely call instance() without null pointer checks all over the place.

The "problem" with this is that now the static instances are scoped and are no longer automatically initialized at runtime. So in order to maintain that original behavior (without an __attribute__((constructor)) routine that calls those functions), I create "dummy" initialization objects with either static routines or constructors which call those functions, (e.g. tim::storage_initializer):

struct storage_initializer
{
    template <typename T>
    static storage_initializer get();
};

template <typename T>
storage_initializer
storage_initializer::get()
{
    static auto _master = []() {
        auto _instance = storage<T>::master_instance();
        if(_instance)
            _instance->initialize();
        return storage_initializer{};
    }();

    static thread_local auto _worker = []() {
        auto _instance = storage<T>::instance();
        if(_instance)
            _instance->initialize();
        return storage_initializer{};
    }();

    consume_parameters(_master);
    return _worker;
}

Due to the data storage model of timemory, I require that static template function but it appears Caliper could just essentially create a single non-templated "master" initializer object that does something similar. Thoughts?

Hi Jonathan,

Thanks for the report. And I thought I finally had all these static object issues fixed :-)

One of the problems is actually with the RuntimeConfig class, but I can fix that with the instance function pattern as you suggested. That is I think what's causing the segfault in your case.

The gObj story is a bit different. Its one data member is actually initialized in the Caliper::instance() function. However, there might be an issue if the object constructor is invoked after the first Caliper::instance() and then overwrites the just-created object. I'll look into that.

Hi @jrmadsen ,

I couldn't reproduce a situation where initializing Caliper in a static object causes a crash, though I don't doubt it can happen. Can you check if that latest commit helps in your case? Also, do you have a small reproducer? Thanks.

Hey David, interestingly enough, it just moved the fiasco to another location. cali_init gets called fine but then I get a nullptr when calling cali_create_attribute. However lazy initialization is working fine though.

By the way, if you feel this is an edge case (I think it might be), de-prioritize it -- I can just let caliper lazily init without issue.

CMakeLists.txt

add_library(fiasco-library STATIC fiasco-library.cpp)
target_link_libraries(fiasco-library PUBLIC caliper)

add_executable(fiasco fiasco.cpp)
target_link_libraries(fiasco PRIVATE fiasco-library)

fiasco-library.hpp

#include <caliper/cali.h>
#include <caliper/Caliper.h>

struct storage_initializer
{
    template <typename T>
    static storage_initializer get();
};

template <typename Tp>
storage_initializer
storage_initializer::get()
{
    std::cout << "\ninitializing..." << std::endl;
    static thread_local auto _worker = []() {
        Tp::global_init();
        return storage_initializer{};
    }();

    return _worker;
}

struct caliper
{
    static void global_init();

    caliper(std::string label);
    void start();
    void stop();

private:
    cali_id_t   m_id = cali_create_attribute("fiasco", CALI_TYPE_STRING,
                                           CALI_ATTR_NESTED | CALI_ATTR_SCOPE_THREAD);
    std::string m_label;
};

fiasco-library.cpp

#include "fiasco-library.hpp"
#include <thread>
#include <chrono>

caliper::caliper(std::string label)
: m_label(label)
{}

void
caliper::global_init()
{
// #if defined(CALL_CALI_INIT)
    std::cout << "calling cali_init...\n" << std::endl;
    cali_init();
// #endif
}

void
caliper::start()
{
    CALI_MARK_FUNCTION_BEGIN;
    cali_begin_string(m_id, m_label.c_str());
    std::this_thread::sleep_for(std::chrono::seconds(1));
    CALI_MARK_FUNCTION_END;
}

void
caliper::stop()
{
    CALI_MARK_FUNCTION_BEGIN;
    std::this_thread::sleep_for(std::chrono::seconds(1));
    cali_end(m_id);
    CALI_MARK_FUNCTION_END;
}

namespace
{
storage_initializer storage_initializer_caliper = storage_initializer::get<caliper>();
}

fiasco.cpp

#include "fiasco-library.hpp"
#include <caliper/Caliper.h>

long
fib(long n)
{
    return (n < 2) ? n : (fib(n - 1) + fib(n - 2));
}

int
main()
{
    caliper f("fiasco");
    f.start();
    auto ret = fib(10);
    f.stop();
    std::cout << "\nfibonacci(40) = " << ret << '\n' << std::endl;
}

I tried converting gObj and sObj to the style I recommended and it worked with the very minor problem of Caliper not recording anything at all!

If you comment out the cali_init you get:

initializing...

fibonacci(40) = 102334155

Path       Inclusive time Exclusive time Time %    
stop             0.000006       0.000006  0.000239 
fiasco           1.508423       0.507666 20.218552 
  stop           1.000757       1.000757 39.856633 
start            1.000711       0.000004  0.000159 
  fiasco         1.000707       1.000707 39.854641 

Here is the system + compiler info + build commands in case that matters:

$ uname -a
Linux docker-desktop 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ c++ --version
clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ ninja -v fiasco
[1/4] /usr/bin/c++  -Dfiasco_EXPORTS  -O2 -g -DNDEBUG -fPIE   -std=c++17 -MD -MT examples/ex-caliper/CMakeFiles/fiasco.dir/fiasco.cpp.o -MF examples/ex-caliper/CMakeFiles/fiasco.dir/fiasco.cpp.o.d -o examples/ex-caliper/CMakeFiles/fiasco.dir/fiasco.cpp.o -c ../examples/ex-caliper/fiasco.cpp
[2/4] /usr/bin/c++    -O2 -g -DNDEBUG -fPIC   -std=c++17 -MD -MT examples/ex-caliper/CMakeFiles/fiasco-library.dir/fiasco-library.cpp.o -MF examples/ex-caliper/CMakeFiles/fiasco-library.dir/fiasco-library.cpp.o.d -o examples/ex-caliper/CMakeFiles/fiasco-library.dir/fiasco-library.cpp.o -c ../examples/ex-caliper/fiasco-library.cpp
[3/4] : && /opt/conda/bin/cmake -E remove libfiasco-library.a && /usr/bin/ar qc libfiasco-library.a  examples/ex-caliper/CMakeFiles/fiasco-library.dir/fiasco-library.cpp.o && /usr/bin/ranlib libfiasco-library.a && :
[4/4] : && /usr/bin/c++  -O2 -g -DNDEBUG  -Wl,--export-dynamic -rdynamic examples/ex-caliper/CMakeFiles/fiasco.dir/fiasco.cpp.o  -o fiasco  libfiasco-library.a libcaliper.a -pthread && :

I think this is a red herring! I basically did:

ltrace ./fiasco &> direct_init.log

then removed calling cali_init() and recompiled and then ran

ltrace ./fiasco &> lazy_init.log

then i did:

diff -y -W 200 direct_init.log lazy_init.log | less

and started comparing the library calls.

Basically there were no differences except for a ton of differences in the environment variables being checked (where the LHS is calling cali_init() and the RHS is lazy):

                                                                              > toupper('c')                                     = 'C'
                                                                              > toupper('a')                                     = 'A'
                                                                              > toupper('l')                                     = 'L'
                                                                              > toupper('i')                                     = 'I'
toupper('_')                                     = '_'                          toupper('_')                                     = '_'
toupper('c')                                     = 'C'                          toupper('c')                                     = 'C'
toupper('o')                                     = 'O'                          toupper('o')                                     = 'O'
toupper('n')                                     = 'N'                          toupper('n')                                     = 'N'
toupper('f')                                     = 'F'                          toupper('f')                                     = 'F'
toupper('i')                                     = 'I'                          toupper('i')                                     = 'I'
toupper('g')                                     = 'G'                          toupper('g')                                     = 'G'
toupper('_')                                     = '_'                          toupper('_')                                     = '_'
toupper('f')                                     = 'F'                          toupper('f')                                     = 'F'
toupper('i')                                     = 'I'                          toupper('i')                                     = 'I'
toupper('l')                                     = 'L'                          toupper('l')                                     = 'L'
toupper('e')                                     = 'E'                          toupper('e')                                     = 'E'
getenv("_CONFIG_FILE")                           = nil                        | getenv("CALI_CONFIG_FILE")                       = nil
                                                                              > _ZdlPv(0x15077d0, 0x15077d2, 10, 70)             = 0
strlen("file")                                   = 4                            strlen("file")                                   = 4

and then right before the LHS segfaults, the lazy init does a ton of work. Thus, I am pretty sure wherever you encode the prefix to your query for environment variables is set to "" so basically none of your services get activated but certain logical checks do get set to true when cali_init() gets called so by the time cali_create_attribute gets called, the logic says to do stuff but nothing is configured.

So, in a way, it might be a static initialization issue but not the typical one that we are used to seeing!

By the way, if you want to see how overboard I went with that style of initialization, take a look at how many times this macro:

#if !defined(TIMEMORY_MEMBER_STATIC_ACCESSOR)
#    define TIMEMORY_MEMBER_STATIC_ACCESSOR(TYPE, FUNC, ENV_VAR, INIT)                   \
    public:                                                                              \
        static TYPE& FUNC() { return instance().m__##FUNC; }                             \
                                                                                         \
    private:                                                                             \
        TYPE generate__##FUNC()                                                          \
        {                                                                                \
            auto _parse = []() { FUNC() = tim::get_env(ENV_VAR, FUNC()); };              \
            get_parse_callbacks().push_back(_parse);                                     \
            return get_env<TYPE>(ENV_VAR, INIT);                                         \
        }                                                                                \
        TYPE m__##FUNC = generate__##FUNC();
#endif

is used in where I handle my settings (it works great but has turned into a terrible idea as the project has gotten larger):
https://github.com/NERSC/timemory/blob/74fde6faa5b5000e73002d694b26c4867044349e/source/timemory/settings/declaration.hpp#L92

Hi @jrmadsen ,

Thanks for the detailed debugging work, that's awesome. I still haven't found a compiler configuration that actually triggers this, so I think I'll try your container environment. I did add some updates that should fix the missing env var prefix situation, but I don't think that's the actual cause of the crash. If you like you can give it a try, though.

Are you building caliper as a static library? I don't get the error if I link to the shared library version of caliper but I can reproduce it on macOS with Apple Clang too if I link to the static library

Ok, it does crash with static linking. I can prevent the segfault but there's still stuff not working. I'll see if I can fix that and keep you posted. Thanks.