60 secs multithread failed to show statistics

Question

60 secs multithread failed to show statistics

Opened this issue 9 years ago · 11 comments

hi.
I got ip_list file:
224.4.4.4:1234
224.4.4.5:4321

then I starting test like this and got next messages:

sockperf sr -f ip_list -m 100 --threads-num 2 --load-vma -F select
VMA INFO : ---------------------------------------------------------------------------
VMA INFO : VMA_VERSION: 7.0.11-0 Development Snapshot built on Dec 3 2015 16:52:56
VMA INFO : Cmd Line: sockperf sr -f ip_list -m 100 --threads-num 2 --load-vma -F select
VMA INFO : Current Time: Thu Jan 21 14:32:29 2016
VMA INFO : Pid: 26235
VMA INFO : OFED Version: MLNX_OFED_LINUX-3.1-1.0.3:
VMA INFO : Architecture: x86_64
VMA INFO : Node: bot4
VMA INFO : ---------------------------------------------------------------------------
VMA INFO : Log Level 3 [VMA_TRACELEVEL]
VMA INFO : ---------------------------------------------------------------------------
sockperf: == version #2.7-41.git241c4528ae75 ==
sockperf: Running 2 threads to manage 2 sockets
sockperf: [SERVER] listen on:
[ 0] IP = 224.4.4.5 PORT = 4321 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [SERVER] listen on:
[ 0] IP = 224.4.4.4 PORT = 1234 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 26246] using select() to block on socket(s)
sockperf: [tid 26245] using select() to block on socket(s)
^Csockperf: Got signal 2 - exiting
sockperf: Total 1 messages received and handled
sockperf: Test end (interrupted by signal 2)
sockperf: Test end (interrupted by signal 2)
sockperf: server_select_per_thread() exit

and client:
sockperf pp -f ip_list -m 100 --srv-num 2 --time 60
sockperf: == version #2.7-41.git241c4528ae75.dirty ==
sockperf[CLIENT] send on:sockperf: using select() to block on socket(s)

[ 0] IP = 224.4.4.4 PORT = 1234 # UDP
[ 1] IP = 224.4.4.5 PORT = 4321 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=60.100 sec; SentMessages=1; ReceivedMessages=1
sockperf: ========= Printing statistics for Server No: 0
sockperf: ERROR: _seqN > m_maxSequenceNo (errno=4 Interrupted system call)

whats wrong?

Answer 1 · 2016-01-21T20:46:01.000Z

As far as I recall '--srv-num' was never implemented in sockperf.
it was a preparation for 1 client to send a single MC packet to multiple different server, which should be echo-ing back from different source-addr into the array of size '--srv-num'.

I'm guessing VMA in the picture is just confusing here.
Does it work without VMA loaded?

Answer 2 · 2016-01-22T09:36:43.000Z

But i starting VMA only on server, dont use it on client.
and if I test without VMA at all, got the same error

Answer 3 · 2018-12-10T19:21:10.000Z

What is this check _seqN > m_maxSequenceNo supposed to accomplish, actually?

The error is 100% reproducible on ppc64le with

./autogen.sh
./configure --prefix=/opt/sockperf-upstream
make && make install
SOCKPERF_PATH=/opt/sockperf-upstream/
$SOCKPERF_PATH/bin/sockperf server --tcp & SERVER_PID=$!
$SOCKPERF_PATH/bin/sockperf under-load --tcp -t 30 --mps 10000 --msg-size 14 --no-rdtsc

sockperf: == version #3.6-0.git737d1e3e5576 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 0.0.0.0 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: ERROR: _seqN > m_maxSequenceNo (errno=0 Success)

Answer 4 · 2018-12-10T19:44:12.000Z

This code looks really fishy to me:

    uint64_t _maxTestDuration =
        1 + s_user_params.sec_test_duration +
        s_user_params.warmup_msec / 1000; // + 1sec for timer inaccuracy safety
    uint64_t _maxSequenceNo = _maxTestDuration * s_user_params.mps +
                              10 * s_user_params.reply_every; // + 10 replies for safety

What is that +10 replies for safety?

Answer 5 · 2018-12-10T20:04:06.000Z

Random constant increase in the code 'fixes' the issue but does not change the fact that the whole calculation is dubious adding bits of padding here and there.

--- a/src/sockperf.cpp
+++ b/src/sockperf.cpp
@@ -3193,7 +3193,7 @@ int bringup(const int *p_daemonize) {
1 + s_user_params.sec_test_duration +
s_user_params.warmup_msec / 1000; // + 1sec for timer inaccuracy safety
uint64_t _maxSequenceNo = _maxTestDuration * s_user_params.mps +

                             10 * s_user_params.reply_every; // + 10 replies for safety

                             10000 * s_user_params.reply_every; // + 10 replies for safety
   _maxSequenceNo += s_user_params.burst_size; // needed for the case burst_size > mps

   if (s_user_params.pPlaybackVector) {

Answer 6 · 2020-04-29T14:27:53.000Z

@hramrach, following commit as 9716c76 should resolve issue

Answer 7 · 2022-04-14T09:32:23.000Z

Still broken in 3.8 which contains this commit.

Answer 8 · 2022-04-14T09:33:14.000Z

make[2]: Leaving directory '/root/sockperf'
make[1]: Leaving directory '/root/sockperf'
make[1]: Entering directory '/root/sockperf'
make[2]: Entering directory '/root/sockperf'
 /usr/bin/mkdir -p '/opt/sockperf-upstream/bin'
  /bin/sh ./libtool   --mode=install /usr/bin/install -c sockperf '/opt/sockperf-upstream/bin'
libtool: install: /usr/bin/install -c sockperf /opt/sockperf-upstream/bin/sockperf
 /usr/bin/mkdir -p '/opt/sockperf-upstream/share/doc/sockperf'
 /usr/bin/install -c -m 644 README.md authors news version copying '/opt/sockperf-upstream/share/doc/sockperf'
make[2]: Leaving directory '/root/sockperf'
make[1]: Leaving directory '/root/sockperf'
jostaberry-8:~/sockperf/:[0]# SOCKPERF_PATH=/opt/sockperf-upstream/
jostaberry-8:~/sockperf/:[0]# $SOCKPERF_PATH/bin/sockperf server --tcp & SERVER_PID=$!
[1] 14258
jostaberry-8:~/sockperf/:[0]# $SOCKPERF_PATH/bin/sockperf under-load --tcp -t 30 --mps 10000 --msg-size 14 --no-rdtscsockperf: == version #3.8-0.git31ee322aa82a == 
sockperf: [SERVER] listen on:
[ 0] IP = ::              PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 14258] using recvfrom() to block on socket(s)

sockperf: == version #3.8-0.git31ee322aa82a == 
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = ::1             PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: ERROR: _seqN > m_maxSequenceNo (errno=0 Success)

Answer 9 · 2022-04-14T09:37:14.000Z

Maybe more robust approach would be to provide the expected and actual number of packets in the statistics rather than failing to provide any statistics when the number of packets is not as expected.

Then the result would also show what kind of error happened with the packet counting.

It may be that you rely on some stronger memory consistency guarantees that hold only on some architectures for updating the packet count and it is simply corrupted.

Answer 10 · 2022-04-15T12:22:56.000Z

Let think if it can be improved.

On my system your scenario works:

sockperf sr --tcp
sockperf: == version #3.8-no.git ==
sockperf: [SERVER] listen on:
[ 0] IP = ::              PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 278522] using recvfrom() to block on socket(s)
^Csockperf: Test end (interrupted by user)
sockperf: Total 112498 messages received and handled

sockperf ul --tcp -t 30 --mps 10000 --msg-size 14 --no-rdtsc
sockperf: == version #3.8-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = ::1             PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=30.000 sec; Warm up time=400 msec; SentMessages=112498; ReceivedMessages=1124
sockperf: ========= Printing statistics for Server No: 0
sockperf: Test end (interrupted by signal 2)
sockperf: [Valid Duration] RunTime=28.695 sec; SentMessages=107601; ReceivedMessages=1077
sockperf: ====> avg-latency=60.432 (std-dev=7.481, mean-ad=5.967, median-ad=2.395, siqr=5.828, cv=0.124, std-error=0.228, 99.0% ci=[59.845, 61.019])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 60.432 usec
sockperf: Total 1077 observations; each percentile contains 10.77 observations
sockperf: ---> <MAX> observation =  124.571
sockperf: ---> percentile 99.999 =  124.571
sockperf: ---> percentile 99.990 =  124.571
sockperf: ---> percentile 99.900 =   92.184
sockperf: ---> percentile 99.000 =   80.884
sockperf: ---> percentile 90.000 =   65.361
sockperf: ---> percentile 75.000 =   64.672
sockperf: ---> percentile 50.000 =   63.496
sockperf: ---> percentile 25.000 =   53.016
sockperf: ---> <MIN> observation =   37.815

Answer 11 · 2022-05-27T16:53:48.000Z

Fix #187 was included into master