Mellanox/sockperf

"--no-rdtsc " option makes the pps/mps incorrect.

jiangdong-sun opened this issue · 6 comments

Note: The sockperf is the latest version just pulled out.

  • Without "--no-rdtsc" option

$ sockperf ping-pong --tcp -i 192.168.10.10 -p 50001 -t 10 --load-vma --pps 10 --dontwarmup
sockperf: == version # ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=102; ReceivedMessages=101
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.900 sec; SentMessages=100; ReceivedMessages=100

The actual pps is correct.

sockperf: ====> avg-lat= 6.703 (std-dev=2.345)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 6.703 usec
sockperf: Total 100 observations; each percentile contains 1.00 observations
sockperf: ---> observation = 16.555
sockperf: ---> percentile 99.999 = 16.555
sockperf: ---> percentile 99.990 = 16.555
sockperf: ---> percentile 99.900 = 16.555
sockperf: ---> percentile 99.000 = 16.096
sockperf: ---> percentile 90.000 = 9.114
sockperf: ---> percentile 75.000 = 6.995
sockperf: ---> percentile 50.000 = 5.901
sockperf: ---> percentile 25.000 = 5.318
sockperf: ---> observation = 3.957

  • With "--no-rdtsc" option

$ sockperf ping-pong --tcp -i 192.168.10.10 -p 50001 -t 10 --load-vma --pps 10 --dontwarmup --no-rdtsc
sockperf: == version # ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=33; ReceivedMessages=33
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.300 sec; SentMessages=31; ReceivedMessages=31

The actual pps is down to one third of the requested pps.

sockperf: ====> avg-lat= 9.099 (std-dev=2.630)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 9.099 usec
sockperf: Total 31 observations; each percentile contains 0.31 observations
sockperf: ---> observation = 16.920
sockperf: ---> percentile 99.999 = 16.920
sockperf: ---> percentile 99.990 = 16.920
sockperf: ---> percentile 99.900 = 16.920
sockperf: ---> percentile 99.000 = 16.920
sockperf: ---> percentile 90.000 = 12.473
sockperf: ---> percentile 75.000 = 10.358
sockperf: ---> percentile 50.000 = 8.604
sockperf: ---> percentile 25.000 = 7.222
sockperf: ---> observation = 5.275

10x for the report.

Can you tell if you see this behavior related to the command line arguments that you used load-vma with very low pps=10, or that it can also happen to you in other situations that you use –no-rdtsc ?

Regards,
Avner

From: jiangdong-sun [mailto:notifications@github.com]
Sent: Tuesday, November 01, 2016 08:42
To: Mellanox/sockperf sockperf@noreply.github.com
Subject: [Mellanox/sockperf] "--no-rdtsc " will make the pps/mps incorrect. (#77)

  • Without "--no-rdtsc" option

$ sockperf ping-pong --tcp -i 192.168.10.10 -p 50001 -t 10 --load-vma --pps 10 --dontwarmup
sockperf: == version # ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=102; ReceivedMessages=101
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.900 sec; SentMessages=100; ReceivedMessages=100

The actual pps is correct.

sockperf: ====> avg-lat= 6.703 (std-dev=2.345)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 6.703 usec
sockperf: Total 100 observations; each percentile contains 1.00 observations
sockperf: ---> observation = 16.555
sockperf: ---> percentile 99.999 = 16.555
sockperf: ---> percentile 99.990 = 16.555
sockperf: ---> percentile 99.900 = 16.555
sockperf: ---> percentile 99.000 = 16.096
sockperf: ---> percentile 90.000 = 9.114
sockperf: ---> percentile 75.000 = 6.995
sockperf: ---> percentile 50.000 = 5.901
sockperf: ---> percentile 25.000 = 5.318
sockperf: ---> observation = 3.957

  • With "--no-rdtsc" option

$ sockperf ping-pong --tcp -i 192.168.10.10 -p 50001 -t 10 --load-vma --pps 10 --dontwarmup --no-rdtsc
sockperf: == version # ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=33; ReceivedMessages=33
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.300 sec; SentMessages=31; ReceivedMessages=31

The actual pps is down to one third of the requested pps.

sockperf: ====> avg-lat= 9.099 (std-dev=2.630)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 9.099 usec
sockperf: Total 31 observations; each percentile contains 0.31 observations
sockperf: ---> observation = 16.920
sockperf: ---> percentile 99.999 = 16.920
sockperf: ---> percentile 99.990 = 16.920
sockperf: ---> percentile 99.900 = 16.920
sockperf: ---> percentile 99.000 = 16.920
sockperf: ---> percentile 90.000 = 12.473
sockperf: ---> percentile 75.000 = 10.358
sockperf: ---> percentile 50.000 = 8.604
sockperf: ---> percentile 25.000 = 7.222
sockperf: ---> observation = 5.275


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com//issues/77, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADlNzHIJ2wz82YgFJhYQdVgggrmNPsFuks5q5t84gaJpZM4Kl0HJ.

Even don't load libvma.so, while with --no-rdtsc option, the issue still exists as below.

$ LD_PRELOAD= sockperf ping-pong --tcp -i 192.168.10.10 -p 50001 -t 10 --pps 10 --dontwarmup --no-rdtsc
sockperf: == version #2.7-43.git3ee62bd8107a ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=33; ReceivedMessages=33
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.300 sec; SentMessages=31; ReceivedMessages=31
sockperf: ====> avg-lat= 12.551 (std-dev=1.455)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 12.551 usec
sockperf: Total 31 observations; each percentile contains 0.31 observations
sockperf: ---> observation = 15.412
sockperf: ---> percentile 99.99 = 15.412
sockperf: ---> percentile 99.90 = 15.412
sockperf: ---> percentile 99.50 = 15.412
sockperf: ---> percentile 99.00 = 15.412
sockperf: ---> percentile 95.00 = 14.626
sockperf: ---> percentile 90.00 = 14.462
sockperf: ---> percentile 75.00 = 13.582
sockperf: ---> percentile 50.00 = 12.488
sockperf: ---> percentile 25.00 = 11.638
sockperf: ---> observation = 10.188

Perhaps reading the clock in your system is very slow ~ 300 milliseconds?

I think that if add --debug or something like that to the command line it will print you how much time it takes to take time in your system.

From: jiangdong-sun [mailto:notifications@github.com]
Sent: Monday, November 07, 2016 04:42
To: Mellanox/sockperf sockperf@noreply.github.com
Cc: Avner BenHanoch avnerbh@gmail.com; Comment comment@noreply.github.com
Subject: Re: [Mellanox/sockperf] "--no-rdtsc " option makes the pps/mps incorrect. (#77)

Even don't load libvma.so, while with --no-rdtsc option, the issue still exists as below.

$ LD_PRELOAD= sockperf ping-pong --tcp -i 192.168.10.10 -p 50001 -t 10 --pps 10 --dontwarmup --no-rdtsc
sockperf: == version #2#2 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=33; ReceivedMessages=33
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.300 sec; SentMessages=31; ReceivedMessages=31
sockperf: ====> avg-lat= 12.551 (std-dev=1.455)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 12.551 usec
sockperf: Total 31 observations; each percentile contains 0.31 observations
sockperf: ---> observation = 15.412
sockperf: ---> percentile 99.99 = 15.412
sockperf: ---> percentile 99.90 = 15.412
sockperf: ---> percentile 99.50 = 15.412
sockperf: ---> percentile 99.00 = 15.412
sockperf: ---> percentile 95.00 = 14.626
sockperf: ---> percentile 90.00 = 14.462
sockperf: ---> percentile 75.00 = 13.582
sockperf: ---> percentile 50.00 = 12.488
sockperf: ---> percentile 25.00 = 11.638
sockperf: ---> observation = 10.188


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/77#issuecomment-258735141, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADlNzAE6FpkrqJ-f5E53QUj53bw9jmB-ks5q7o_fgaJpZM4Kl0HJ.

Perhaps reading the clock in your system is very slow ~ 300 milliseconds?

It's impossible.
Look the running data with debug

$ sockperf ping-pong --debug --tcp -i 192.168.10.10 -p 50001 -t 10 --pps 10 --dontwarmup --no-rdtsc
sockperf: +INFO:
mode = 0
with_sock_accl = 0
msg_size = 14
msg_size_range = 0
sec_test_duration = 10
data_integrity = 0
packetrate_stats_print_ratio = 0
burst_size = 1
packetrate_stats_print_details = 0
fd_handler_type = 0
mthread_server = 0
sock_buff_size = 0
threads_num = 1
threads_affinity =
is_blocked = 1
is_nonblocked_send = 0
do_warmup = 0
pre_warmup_wait = 0
is_vmarxfiltercb = 0
is_vmazcopyread = 0
mc_loop_disable = 1
mc_ttl = 2
tcp_nodelay = 1
client_work_with_srv_num = 1
b_server_reply_via_uc = 0
b_server_dont_reply = 0
b_server_detect_gaps = 0
mps = 10
client_bind_info = 0.0.0.0:0
reply_every = 1
b_client_ping_pong = 1
b_no_rdtsc = 1
sender_affinity =
receiver_affinity =
b_stream = 0
daemonize = 0
feedfile_name =
tos = 0

sockperf: == version #2.7-43.git3ee62bd8107a ==
sockperf: +INFO: taking time, using the given settings, consumes 33.646 nsec
sockperf: +INFO: taking rdtsc directly consumes 8.675 nsec
sockperf: [fd=17] Binding to: 0.0.0.0:0...
sockperf: [fd=17] Connecting to: 192.168.10.10:50001...
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=33; ReceivedMessages=33
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.300 sec; SentMessages=31; ReceivedMessages=31
sockperf: ====> avg-lat= 9.356 (std-dev=1.543)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 9.356 usec
sockperf: Total 31 observations; each percentile contains 0.31 observations
sockperf: ---> observation = 12.102
sockperf: ---> percentile 99.99 = 12.102
sockperf: ---> percentile 99.90 = 12.102
sockperf: ---> percentile 99.50 = 12.102
sockperf: ---> percentile 99.00 = 12.102
sockperf: ---> percentile 95.00 = 11.561
sockperf: ---> percentile 90.00 = 11.485
sockperf: ---> percentile 75.00 = 10.435
sockperf: ---> percentile 50.00 = 9.124
sockperf: ---> percentile 25.00 = 8.240
sockperf: ---> observation = 7.040

+Alex +Ophir

Yes. you are right. Taking the time in your system is quite fast:

sockperf: +INFO: taking time, using the given settings, consumes 33.646 nsec
sockperf: +INFO: taking rdtsc directly consumes 8.675 nsec
and still you ses only ~30 packets in 10 seconds instead of 100.

Since with RDTSC all is good, I believe that your network setup is okay.

Perhaps, Alex, Ophir, can check if we reproduce it here and tell more. I don’t have sockperf environment at the moment.

Avner

From: jiangdong-sun [mailto:notifications@github.com]
Sent: Monday, November 07, 2016 10:07
To: Mellanox/sockperf sockperf@noreply.github.com
Cc: Avner BenHanoch avnerbh@gmail.com; Comment comment@noreply.github.com
Subject: Re: [Mellanox/sockperf] "--no-rdtsc " option makes the pps/mps incorrect. (#77)

Perhaps reading the clock in your system is very slow ~ 300 milliseconds?
It's impossible.
Look the running data with debug
$ sockperf ping-pong --debug --tcp -i 192.168.10.10 -p 50001 -t 10 --pps 10 --dontwarmup --no-rdtsc
sockperf: +INFO:
mode = 0
with_sock_accl = 0
msg_size = 14
msg_size_range = 0
sec_test_duration = 10
data_integrity = 0
packetrate_stats_print_ratio = 0
burst_size = 1
packetrate_stats_print_details = 0
fd_handler_type = 0
mthread_server = 0
sock_buff_size = 0
threads_num = 1
threads_affinity =

is_blocked = 1
is_nonblocked_send = 0
do_warmup = 0
pre_warmup_wait = 0
is_vmarxfiltercb = 0
is_vmazcopyread = 0
mc_loop_disable = 1
mc_ttl = 2
tcp_nodelay = 1
client_work_with_srv_num = 1
b_server_reply_via_uc = 0
b_server_dont_reply = 0
b_server_detect_gaps = 0
mps = 10
client_bind_info = 0.0.0.0:0
reply_every = 1
b_client_ping_pong = 1
b_no_rdtsc = 1
sender_affinity =
receiver_affinity =
b_stream = 0
daemonize = 0
feedfile_name =
tos = 0

sockperf: == version #2#2 ==
sockperf: +INFO: taking time, using the given settings, consumes 33.646 nsec
sockperf: +INFO: taking rdtsc directly consumes 8.675 nsec
sockperf: [fd=17] Binding to: 0.0.0.0:0...
sockperf: [fd=17] Connecting to: 192.168.10.10:50001...
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)

[ 0] IP = 192.168.10.10 PORT = 50001 # TCP
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=33; ReceivedMessages=33
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=9.300 sec; SentMessages=31; ReceivedMessages=31
sockperf: ====> avg-lat= 9.356 (std-dev=1.543)
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 9.356 usec
sockperf: Total 31 observations; each percentile contains 0.31 observations
sockperf: ---> observation = 12.102
sockperf: ---> percentile 99.99 = 12.102
sockperf: ---> percentile 99.90 = 12.102
sockperf: ---> percentile 99.50 = 12.102
sockperf: ---> percentile 99.00 = 12.102
sockperf: ---> percentile 95.00 = 11.561
sockperf: ---> percentile 90.00 = 11.485
sockperf: ---> percentile 75.00 = 10.435
sockperf: ---> percentile 50.00 = 9.124
sockperf: ---> percentile 25.00 = 8.240
sockperf: ---> observation = 7.040


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com//issues/77#issuecomment-258770157, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADlNzPG28PKOyvOeRWyR8AbNU2khy-14ks5q7twpgaJpZM4Kl0HJ.

Fix #187 was included into master