LLNL/GOTCHA

Wraps different symbol version

Closed this issue · 5 comments

GOTCHA currently does not honor symbol versions used in glibc.

Case at hands: I wanted to wrap the pthread_cond_* family of functions. Though in glibc there are two symbol versions:

  • GLIBC_2.2.5
  • GLIBC_2.3.2

Unfortunately, for glibc 2.31 (Ubuntu 20.04), GOTCHA wraps pthread_cond_init to __pthread_cond_init_2_0 but pthread_cond_broadcast to __pthread_cond_broadcast. I.e., one to the old ABI and the other to the new ABI which breaks at runtime.

The reason seems to be that the order of the symbols in libpthread-2.31.so is:

   267: 0000000000010610    31 FUNC    GLOBAL DEFAULT   16 pthread_cond_init@GLIBC_2.2.5
   268: 000000000000ed90    58 FUNC    GLOBAL DEFAULT   16 pthread_cond_init@@GLIBC_2.3.2
   205: 0000000000010290   894 FUNC    GLOBAL DEFAULT   16 pthread_cond_broadcast@@GLIBC_2.3.2
   206: 00000000000107c0   101 FUNC    GLOBAL DEFAULT   16 pthread_cond_broadcast@GLIBC_2.2.5

Here is my testcase:

#include <pthread.h>
#include <gotcha/gotcha.h>

static int
my_cond_init(pthread_cond_t* cond, const pthread_condattr_t* attr)
{
    printf("my_cond_init\n");
}

static int
my_cond_broadcast(pthread_cond_t* cond)
{
    printf("my_cond_bcast\n");
}

int
main(int ac, char *av[])
{
    gotcha_wrappee_handle_t cond_init_handle;
    gotcha_wrappee_handle_t cond_bcast_handle;

    struct gotcha_binding_t wrap_actions [] = {
        { "pthread_cond_init", my_cond_init, &cond_init_handle },
        { "pthread_cond_broadcast", my_cond_broadcast, &cond_bcast_handle },
    };
    gotcha_wrap(wrap_actions, sizeof(wrap_actions)/sizeof(struct gotcha_binding_t), "my_tool_name");

    printf("%p\n", gotcha_get_wrappee(cond_init_handle));
    printf("%p\n", gotcha_get_wrappee(cond_bcast_handle));

    sleep(10); // break/interrupt here

    pthread_cond_t cond;
    pthread_cond_init(&cond, NULL);
    pthread_cond_broadcast(&cond);

    return 0;
}

When compile/linked to master, you will get this:

$ gdb ./test
(gdb) run
0x7ffff7f77610
0x7ffff7f77290
^C
(gdb) p (void(*)())0x7ffff7f77610
$1 = (void (*)()) 0x7ffff7f77610 <__pthread_cond_init_2_0>
(gdb) p (void(*)())0x7ffff7f77290
$2 = (void (*)()) 0x7ffff7f77290 <__pthread_cond_broadcast>

With this fix, the output is:

0x7ffff7f78d90
0x7ffff7f7a290
^C
(gdb) p (void(*)())0x7ffff7f78d90
$1 = (void (*)()) 0x7ffff7f78d90 <__pthread_cond_init>
(gdb) p (void(*)())0x7ffff7f7a290
$2 = (void (*)()) 0x7ffff7f7a290 <__pthread_cond_broadcast>

I.e., with the patch gotcha resolves for the two functions to the same/last version.

@mplegendre would this issue also apply to the elf_hash_symbol? Based on documentation here. I believe it could.

I'm not sure what you're asking here.

The versioning of symbols, does it only apply to gnu hash lookups or even elf hash lookups. this is the place I am asking about lookup_elf_hash_symbol

I didn't read anything in here about the ELF hash symbol. Its actually completely silence about the hash tables, which makes completely sense now. Based on the glibc release history, symbol versioning was introduced in 1999 and the GNU hash table in 2006. Therefore I conclude that you are correct and the versioning must also be applied to lookup_elf_hash_symbol. Will update…