Mellanox/sockperf

sockperf crashed due to segment fault when using libvma

g199209 opened this issue · 13 comments

4716b150645180735fdbf816adede0a39dc64e86 issue 3109144: Adding Unix Domain Socket support in SOCK_STREAM and SOCK_DGRAM
caused the bug.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000957c15 in __gnu_cxx::__exchange_and_add (__val=0xffffffff, __mem=0xfffffffffffffff8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/ext/atomicity.h:66
66	  { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.3.x86_64 libgcc-4.8.5-36.el7.x86_64 libibverbs-41mlnx1-OFED.4.5.0.1.0.45101.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-41mlnx1-OFED.4.2.0.1.3.45101.x86_64 libstdc++-4.8.5-36.el7.x86_64 numactl-libs-2.0.9-7.el7.x86_64
gef> bt
#0  0x0000000000957c15 in __gnu_cxx::__exchange_and_add (__val=0xffffffff, __mem=0xfffffffffffffff8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/ext/atomicity.h:66
#1  __gnu_cxx::__exchange_and_add_dispatch (__val=0xffffffff, __mem=0xfffffffffffffff8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/ext/atomicity.h:101
#2  std::string::_Rep::_M_dispose (__a=..., this=0xffffffffffffffe8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3348
#3  std::string::_Rep::_M_dispose (__a=..., this=0xffffffffffffffe8) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3332
#4  std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string (this=0x7f4dcb787570, __in_chrg=<optimized out>) at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/basic_string.h:3768
#5  IPAddress::~IPAddress (this=0x7f4dcb787558, __in_chrg=<optimized out>) at src/ip_address.h:54
#6  user_params_t::~user_params_t (this=0x7f4dcb787500, __in_chrg=<optimized out>) at src/defs.h:692
#7  0x00007f4dca418eda in __cxa_finalize () from /lib64/libc.so.6
#8  0x00007f4dcb44a353 in ?? ()
#9  0x00007ffcd19e2c90 in ?? ()
#10 0x00007f4dcb79bfba in _dl_fini () from /lib64/ld-linux-x86-64.so.2
Backtrace stopped: frame did not save the PC

build enviroment : gcc 8.3, linux x86_64

test commnad: ./sockperf pp -i xxx.xxx.xxx.xxx --load-vma

@igor-ivanov @EldarShalev

The v5.8-2.0.3.0 LTS MLNX_OFED driver contains v3.10 sockperf, it also crashed.

Hello @g199209.
Could you clarify

  • the issue is reproduced with 4716b150645180735fdbf816adede0a39dc64e86
  • the issue does not seen before 4716b150645180735fdbf816adede0a39dc64e86
  • the issue exists in sockperf v3.10 too
  • libvma is taken from v5.8-2.0.3.0 LTS MLNX_OFED

Is it correct?

Hello @g199209. Could you clarify

  • the issue is reproduced with 4716b150645180735fdbf816adede0a39dc64e86
  • the issue does not seen before 4716b150645180735fdbf816adede0a39dc64e86
  • the issue exists in sockperf v3.10 too
  • libvma is taken from v5.8-2.0.3.0 LTS MLNX_OFED

Is it correct?

Yes.

Is there any plan to fix this bug?

@igor-ivanov Any progress?

Hello @g199209,

The failure you described does not happen on my setup.
libvma(v9.7.2) - from MLNX OFED 5.8-2.0.3.0
sockperf(version #3.8-21.git4716b1506451)

server:

$ sudo sockperf sr -i 192.168.105.3 --load-vma=libvma.so
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: VMA_VERSION: 9.7.2-1 Release built on Nov 14 2022 17:03:52
 VMA INFO: Cmd Line: sockperf sr -i 192.168.105.3 --load-vma=libvma.so
 VMA INFO: OFED Version: MLNX_OFED_LINUX-5.9-0.5.5.2:
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: Log Level                      INFO                       [VMA_TRACELEVEL]
 VMA INFO: ---------------------------------------------------------------------------
sockperf: == version #3.8-21.git4716b1506451 ==
sockperf: [SERVER] listen on:
[ 0] IP = 192.168.105.3   PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 1570954] using recvfrom() to block on socket(s)
^Csockperf: Test end (interrupted by user)
sockperf: Total 204243 messages received and handled
sockperf: cleanupAfterLoop() exit

client:

$ sudo sockperf pp -i 192.168.105.3 --load-vma=libvma.so
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: VMA_VERSION: 9.7.2-1 Release built on Nov 14 2022 17:03:52
 VMA INFO: Cmd Line: sockperf pp -i 192.168.105.3 --load-vma=libvma.so
 VMA INFO: OFED Version: MLNX_OFED_LINUX-5.9-0.5.5.2:
 VMA INFO: ---------------------------------------------------------------------------
 VMA INFO: Log Level                      INFO                       [VMA_TRACELEVEL]
 VMA INFO: ---------------------------------------------------------------------------
sockperf: == version #3.8-21.git4716b1506451 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.105.3   PORT = 11111 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.000 sec; Warm up time=400 msec; SentMessages=204243; ReceivedMessages=204242
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=0.550 sec; SentMessages=126961; ReceivedMessages=126961
sockperf: ====> avg-latency=2.155 (std-dev=0.271, mean-ad=0.087, median-ad=0.103, siqr=0.075, cv=0.126, std-error=0.001, 99.0% ci=[2.153, 2.157])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 2.155 usec
sockperf: Total 126961 observations; each percentile contains 1269.61 observations
sockperf: ---> <MAX> observation =   48.565
sockperf: ---> percentile 99.999 =   45.589
sockperf: ---> percentile 99.990 =    4.478
sockperf: ---> percentile 99.900 =    2.634
sockperf: ---> percentile 99.000 =    2.374
sockperf: ---> percentile 90.000 =    2.264
sockperf: ---> percentile 75.000 =    2.224
sockperf: ---> percentile 50.000 =    2.184
sockperf: ---> percentile 25.000 =    2.073
sockperf: ---> <MIN> observation =    1.823

It does not happen with sockperf v3.10 too.
Please consider compiling current libvma (https://github.com/Mellanox/libvma) and sockperf from sources and check the failure case on your setup.