Mellanox/libvma

segfault at sys_readv () from /lib64/libvma.so

Opened this issue · 6 comments

Subject

segfault at sys_readv () from /lib64/libvma.so

Issue type

  • Bug report
  • Feature request

Configuration:

  • Product version : libvma9.4
  • OS: Oracle Linux 8.3
  • OFED: MLNX_OFED_LINUX-5.4-1.0.3.0
  • Hardware: Mellanox Technologies MT27700 Family [ConnectX-4]

Actual behavior:

While running LD_PRELOAD with glusterd (GlusterFS) I see a segfault at sys_readv(). I enabled debug mode while compiling but I do not able to see the exact crash location inside libvma code. Following is the command I used for configuring debug build.

[root@dev-mc libvma]# ./configure --with-ofed=/usr --prefix=/usr --libdir=/usr/lib64 --includedir=/usr/include --docdir=/usr/share/doc/libvma --sysconfdir=/etc --enable-debug

Crash:

#0 0x00007f92909093f0 in sys_readv () from /lib64/libvma.so
#1 0x00007f919ecc7217 in __socket_ssl_readv (this=this@entry=0x7f9194004570, opvector=opvector@entry=0x7f9194004d08, opcount=opcount@entry=1) at socket.c:568
#2 0x00007f919ecc74ea in __socket_cached_read (opcount=1, opvector=0x7f9194004d08, this=0x7f9194004570) at socket.c:652
#3 __socket_rwv (this=this@entry=0x7f9194004570, vector=, count=count@entry=1, pending_vector=pending_vector@entry=0x7f9194004d48, pending_count=pending_count@entry=0x7f9194004d54, bytes=bytes@entry=0x0, write=0) at socket.c:734
#4 0x00007f919ecc84ab in __socket_readv (bytes=0x0, pending_count=0x7f9194004d54, pending_vector=0x7f9194004d48, count=1, vector=, this=0x7f9194004570) at socket.c:2354
#5 __socket_proto_state_machine (this=this@entry=0x7f9194004570, pollin=pollin@entry=0x7f919dfa8ef0) at socket.c:2354
#6 0x00007f919eccbda4 in socket_proto_state_machine (pollin=0x7f919dfa8ef0, this=0x7f9194004570) at socket.c:2542
#7 socket_event_poll_in (notify_handled=true, this=0x7f9194004570) at socket.c:2542
#8 socket_event_handler (event_thread_died=0 '\000', poll_err=, poll_out=, poll_in=, data=0x7f9194004570, gen=1, idx=2, fd=56) at socket.c:2948
#9 socket_event_handler (fd=fd@entry=56, idx=idx@entry=2, gen=gen@entry=1, data=data@entry=0x7f9194004570, poll_in=, poll_out=, poll_err=, event_thread_died=0 '\000') at socket.c:2868
#10 0x00007f92903099dc in event_dispatch_epoll_handler (event=0x7f919dfa8f94, event_pool=0x5598f07fab20) at event-epoll.c:692
#11 event_dispatch_epoll_worker (data=0x5598f0e03170) at event-epoll.c:803

Expected behavior:

libvma should not segfault while running with GlusterFS.

Steps to reproduce:

  1. Start glusterd in foreground with LD_PRELOAD:

    LD_PRELOAD=libvma.so /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO -N

  2. Run gluster cli command to configure gluster volume

    gluster volume info

  3. After running the cli command the glusterd gets a segfault.

Hello @syspro4
Thank you for reporting the issue.
I think that this issue might happen because of symbol sys_readv conflict. It exists in glusterfs and libvma

glusterfs: https://github.com/gluster/glusterfs/blob/2ff6e2d5e217ab555ff63026017151edf2ba1adf/rpc/rpc-transport/socket/src/socket.c#L557

libvma:

sys_readv_fn sys_readv;

Solution will be planned.

Thanks for the reply!
I will change the gluster code and replace glusterfs->sys_readv to new_sys_readv() and try to use libvma.

I fixed gluserfs->sys_readv to new_sys_readv() and now I can start glusterd with libvma.
But now it fails to spawn new process (glusterfsd). glusterfsd is a daemon process which does actual IOs to the underlying file system. Does libvma supports fork/execvp() system call?

In the log I see following error messages:

[2021-11-29 22:37:07.547610 +0000] I [glusterfsd.c:2418:daemonize] 0-glusterfs: Pid of current running process is 6511
[2021-11-29 22:37:10.928985 +0000] I [socket.c:929:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 103
[2021-11-29 22:37:10.929176 +0000] E [MSGID: 101187] [event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]
[2021-11-29 22:37:10.929196 +0000] W [socket.c:3779:socket_listen] 0-socket.glusterfsd: could not register socket 102 with events; closing socket
[2021-11-29 22:37:10.929218 +0000] W [rpcsvc.c:1993:rpcsvc_create_listener] 0-rpc-service: listening on transport failed

Thanks

Nice to see that sys_readv issue can be overcome.
libvma supports fork()/exec() case. See 24bd173
and
related test as https://github.com/Mellanox/libvma/tree/master/tests/simple_fork
VMA_TRACELEVEL=4 can be used to display VMA output.

Thanks for the reply.
But I am getting error while running the Gluster services (glusterd & glusterfsd) in demonize mode while using libvma.
I always get same error:

[event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]

Is it possible that while forking()/execing() some FDs are getting closed & hence the epoll_ctl(,EPOLL_CTL_ADD, fd, ) call is failing?

  1. I would like to inform that current master should not have symbol conflict initially reported.
  2. About #969 (comment)
    Do you know if Gluster application uses flow described at #816?
    Could you try VMA_TRACELEVEL=4 and see suspicuos VMA output around [event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]