multicore-locks/litl

static_var pthread_to_lock is 0 after ld_preload segment map

Opened this issue · 5 comments

jdmfr commented

Model name: AMD EPYC 7763 64-Core Processor
Linux localhost.localdomain 4.18.0
glibc-version is 2.28

after i compile litl and try any lock-algorithm, i find that:

2022-08-18 21 03 45

because pthread_to_lock is 0 .

i try to use gdb to find why it becomes '0' after create_clht , at last i find that :

screenshot 2022-08-18 21 06 51

would you mind telling me how to fix this?
maybe i should compile a lower version glibc?

Hi @jdmfr . I think either one of the two things happen:

  1. a lock is loaded before litl is in the address space. This causes ht_lock_create to not be called, but then ht_lock_get is called on an unitialized lock.
  2. the shared library is loaded twice (and maybe even unloaded in between?). The first time, when "Using,dormous [...]" is printed, the second time, where you have your stack trace.

I would try to put a breakpoint on real_interpose_init to see where it is called. It should be called only once.

jdmfr commented

thanks for your answer.
After set breakpoints , it seems that the second reason cause this .

1.gdb cant stop at the first time "time interpose_init called" , and after ht_lock_create called , pthread_to_lock is a non-zero num (as in the picture 2 , pthread_to_lock initialized to 0x5555....).
2. after that , before stop at the second time 'interpose_init' called , SIGSEGV happened ,because the function pointer pthread_mutex_lock is set to 0 (the reason is that dl_map_segments set these vars 0 ).


i run benchmark by command (without gdb)

LD_PRELOAD=./lib/libxxx.so my_prog

i run benchmark by command (with gdb)

gdb my_prog
set env LD_PRELOAD=./lib/libxxx.so.
r

I even try litl on different architecture(x86 and LoongArch64 , both glibc version is 2.28 ,gcc 8.3.0 ), but both get the same result.

1.how can i stop at the first time interpose_init
2. how can i control ld only load the litl only once.( Maybe LD_PRELOAD is not a good choice?)

It's a compiler-toolchain related problem ,not a lock-related 😆 .
Thanks again.

You can try to modify interpose_init to print the stack trace using something like https://www.gnu.org/software/libc/manual/html_node/Backtraces.html
That should give you an idea why it is called.

Is it an issue only with your specific program or with any program? Can you try with an older version of glibc (using for example a docker container)?

jdmfr commented

I use a lower glibc-2.24 in a docker ,and the ld_preload execute successfully .

In " gdb ./program & set env LD_PRELOAD=... " ,interpose_init also execute twice . The most difference is :

in 2.24 , Before the real "interpose_init" execute , pthread_mutex_lock@plt still redirect to the glibc nptl library .
but in my glibc version , before interpose_init execute , pthread_mutex_lock already redirect to the "pthread_mutex_lock" in interpose.c...

Unfortunately , my experiment needs to perform on a new architecture 'loongarch' , but glibc only support this architecture above 2.28 version . I need to find why glibc redirect the symbol strangely both on x86 and loongarch

Thank you .


Or ,is there another better way for me to change lock algorithm in user mode with less work?

kelark commented

I had a similar problem when I ran the upscaledb test program with the litl script. The error is as follows. Have you solved your problem? Please let me know
my error is as follows:
/home/lym/lym_ceshi/litl/libmcs_spinlock.sh: line 33: 17422 Segmentation fault LD_PRELOAD=$LD_PRELOAD:$BASE/lib/libmcs_spinlock.so "$@"

i try gdb and find:
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ff5717e7a20 in mcs_mutex_create (attr=attr@entry=0x0) at mcs.c:70
#2 0x00007ff5717e6d5f in ht_lock_create (attr=0x0, mutex=0x560a976f2000) at interpose.c:141
#3 pthread_mutex_init (mutex=0x560a976f2000, attr=0x0) at interpose.c:423
#4 0x00007ff56ee40ac8 in google::protobuf::DescriptorPool::DescriptorPool(google::protobuf::DescriptorDatabase*, google::protobuf::DescriptorPool::ErrorCollector*) () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.10
#5 0x00007ff56ee40ba0 in ?? () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.10
#6 0x00007ff56ee0b73a in google::protobuf::GoogleOnceInitImpl(long*, google::protobuf::Closure*) () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.10
#7 0x00007ff56ee386df in google::protobuf::DescriptorPool::InternalAddGeneratedFile(void const*, int) ()
from /usr/lib/x86_64-linux-gnu/libprotobuf.so.10
#8 0x00007ff56ee2f914 in google::protobuf::protobuf_AddDesc_google_2fprotobuf_2fany_2eproto() () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.10
#9 0x00007ff5719ff8d3 in call_init (env=0x7ffd9fa2a358, argv=0x7ffd9fa2a348, argc=1, l=) at dl-init.c:72
#10 _dl_init (main_map=0x7ff571c1a170, argc=1, argv=0x7ffd9fa2a348, env=0x7ffd9fa2a358) at dl-init.c:119
#11 0x00007ff5719f00ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#12 0x0000000000000001 in ?? ()
#13 0x00007ffd9fa2b372 in ?? ()
---Type to continue, or q to quit---

the last function is mcs_mutex_create:

mcs_mutex_t *mcs_mutex_create(const pthread_mutexattr_t *attr) {
mcs_mutex_t *impl = (mcs_mutex_t )alloc_cache_align(sizeof(mcs_mutex_t));
impl->tail = 0;
#if COND_VAR
REAL(pthread_mutex_init)(&impl->posix_lock, /
&errattr */ attr); //line 70
DEBUG("Mutex init lock=%p posix_lock=%p\n", impl, &impl->posix_lock);
#endif

return impl;

}