60 secs multithread failed to show statistics
Opened this issue · 11 comments
hi.
I got ip_list file:
224.4.4.4:1234
224.4.4.5:4321
then I starting test like this and got next messages:
sockperf sr -f ip_list -m 100 --threads-num 2 --load-vma -F select
VMA INFO : ---------------------------------------------------------------------------
VMA INFO : VMA_VERSION: 7.0.11-0 Development Snapshot built on Dec 3 2015 16:52:56
VMA INFO : Cmd Line: sockperf sr -f ip_list -m 100 --threads-num 2 --load-vma -F select
VMA INFO : Current Time: Thu Jan 21 14:32:29 2016
VMA INFO : Pid: 26235
VMA INFO : OFED Version: MLNX_OFED_LINUX-3.1-1.0.3:
VMA INFO : Architecture: x86_64
VMA INFO : Node: bot4
VMA INFO : ---------------------------------------------------------------------------
VMA INFO : Log Level 3 [VMA_TRACELEVEL]
VMA INFO : ---------------------------------------------------------------------------
sockperf: == version #2.7-41.git241c4528ae75 ==
sockperf: Running 2 threads to manage 2 sockets
sockperf: [SERVER] listen on:
[ 0] IP = 224.4.4.5 PORT = 4321 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [SERVER] listen on:
[ 0] IP = 224.4.4.4 PORT = 1234 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 26246] using select() to block on socket(s)
sockperf: [tid 26245] using select() to block on socket(s)
^Csockperf: Got signal 2 - exiting
sockperf: Total 1 messages received and handled
sockperf: Test end (interrupted by signal 2)
sockperf: Test end (interrupted by signal 2)
sockperf: server_select_per_thread() exit
and client:
sockperf pp -f ip_list -m 100 --srv-num 2 --time 60
sockperf: == version #2.7-41.git241c4528ae75.dirty ==
sockperf[CLIENT] send on:sockperf: using select() to block on socket(s)
[ 0] IP = 224.4.4.4 PORT = 1234 # UDP
[ 1] IP = 224.4.4.5 PORT = 4321 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=60.100 sec; SentMessages=1; ReceivedMessages=1
sockperf: ========= Printing statistics for Server No: 0
sockperf: ERROR: _seqN > m_maxSequenceNo (errno=4 Interrupted system call)
whats wrong?
As far as I recall '--srv-num' was never implemented in sockperf.
it was a preparation for 1 client to send a single MC packet to multiple different server, which should be echo-ing back from different source-addr into the array of size '--srv-num'.
I'm guessing VMA in the picture is just confusing here.
Does it work without VMA loaded?
But i starting VMA only on server, dont use it on client.
and if I test without VMA at all, got the same error
What is this check _seqN > m_maxSequenceNo supposed to accomplish, actually?
The error is 100% reproducible on ppc64le with
./autogen.sh
./configure --prefix=/opt/sockperf-upstream
make && make install
SOCKPERF_PATH=/opt/sockperf-upstream/
$SOCKPERF_PATH/bin/sockperf under-load --tcp -t 30 --mps 10000 --msg-size 14 --no-rdtsc
sockperf: == version #3.6-0.git737d1e3e5576 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 0.0.0.0 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: ERROR: _seqN > m_maxSequenceNo (errno=0 Success)
This code looks really fishy to me:
uint64_t _maxTestDuration =
1 + s_user_params.sec_test_duration +
s_user_params.warmup_msec / 1000; // + 1sec for timer inaccuracy safety
uint64_t _maxSequenceNo = _maxTestDuration * s_user_params.mps +
10 * s_user_params.reply_every; // + 10 replies for safety
What is that +10 replies for safety?
Random constant increase in the code 'fixes' the issue but does not change the fact that the whole calculation is dubious adding bits of padding here and there.
--- a/src/sockperf.cpp
+++ b/src/sockperf.cpp
@@ -3193,7 +3193,7 @@ int bringup(const int *p_daemonize) {
1 + s_user_params.sec_test_duration +
s_user_params.warmup_msec / 1000; // + 1sec for timer inaccuracy safety
uint64_t _maxSequenceNo = _maxTestDuration * s_user_params.mps +
-
10 * s_user_params.reply_every; // + 10 replies for safety
-
10000 * s_user_params.reply_every; // + 10 replies for safety _maxSequenceNo += s_user_params.burst_size; // needed for the case burst_size > mps if (s_user_params.pPlaybackVector) {
Still broken in 3.8 which contains this commit.
make[2]: Leaving directory '/root/sockperf'
make[1]: Leaving directory '/root/sockperf'
make[1]: Entering directory '/root/sockperf'
make[2]: Entering directory '/root/sockperf'
/usr/bin/mkdir -p '/opt/sockperf-upstream/bin'
/bin/sh ./libtool --mode=install /usr/bin/install -c sockperf '/opt/sockperf-upstream/bin'
libtool: install: /usr/bin/install -c sockperf /opt/sockperf-upstream/bin/sockperf
/usr/bin/mkdir -p '/opt/sockperf-upstream/share/doc/sockperf'
/usr/bin/install -c -m 644 README.md authors news version copying '/opt/sockperf-upstream/share/doc/sockperf'
make[2]: Leaving directory '/root/sockperf'
make[1]: Leaving directory '/root/sockperf'
jostaberry-8:~/sockperf/:[0]# SOCKPERF_PATH=/opt/sockperf-upstream/
jostaberry-8:~/sockperf/:[0]# $SOCKPERF_PATH/bin/sockperf server --tcp & SERVER_PID=$!
[1] 14258
jostaberry-8:~/sockperf/:[0]# $SOCKPERF_PATH/bin/sockperf under-load --tcp -t 30 --mps 10000 --msg-size 14 --no-rdtscsockperf: == version #3.8-0.git31ee322aa82a ==
sockperf: [SERVER] listen on:
[ 0] IP = :: PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 14258] using recvfrom() to block on socket(s)
sockperf: == version #3.8-0.git31ee322aa82a ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = ::1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: ERROR: _seqN > m_maxSequenceNo (errno=0 Success)
Maybe more robust approach would be to provide the expected and actual number of packets in the statistics rather than failing to provide any statistics when the number of packets is not as expected.
Then the result would also show what kind of error happened with the packet counting.
It may be that you rely on some stronger memory consistency guarantees that hold only on some architectures for updating the packet count and it is simply corrupted.
Let think if it can be improved.
On my system your scenario works:
sockperf sr --tcp
sockperf: == version #3.8-no.git ==
sockperf: [SERVER] listen on:
[ 0] IP = :: PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: [tid 278522] using recvfrom() to block on socket(s)
^Csockperf: Test end (interrupted by user)
sockperf: Total 112498 messages received and handled
sockperf ul --tcp -t 30 --mps 10000 --msg-size 14 --no-rdtsc
sockperf: == version #3.8-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = ::1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=30.000 sec; Warm up time=400 msec; SentMessages=112498; ReceivedMessages=1124
sockperf: ========= Printing statistics for Server No: 0
sockperf: Test end (interrupted by signal 2)
sockperf: [Valid Duration] RunTime=28.695 sec; SentMessages=107601; ReceivedMessages=1077
sockperf: ====> avg-latency=60.432 (std-dev=7.481, mean-ad=5.967, median-ad=2.395, siqr=5.828, cv=0.124, std-error=0.228, 99.0% ci=[59.845, 61.019])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 60.432 usec
sockperf: Total 1077 observations; each percentile contains 10.77 observations
sockperf: ---> <MAX> observation = 124.571
sockperf: ---> percentile 99.999 = 124.571
sockperf: ---> percentile 99.990 = 124.571
sockperf: ---> percentile 99.900 = 92.184
sockperf: ---> percentile 99.000 = 80.884
sockperf: ---> percentile 90.000 = 65.361
sockperf: ---> percentile 75.000 = 64.672
sockperf: ---> percentile 50.000 = 63.496
sockperf: ---> percentile 25.000 = 53.016
sockperf: ---> <MIN> observation = 37.815
Fix #187 was included into master