MPQUIC and packet loss in a mininet envirement

Question

MPQUIC and packet loss in a mininet envirement

elhachi opened this issue 3 years ago · 21 comments

Hello,
I’m writing an MPQUIC client-server connection using Golang in a Mininet environment. I want to forward a file from the server to the client, so I send the whole stream over the connection and I create a buffer to read data from the stream and store it to the received file.
Everything went well until I added loss/delay options to the Mininet environment. Although the loss was only 1%, the number of packet losses was bigger than what was expected. Sometimes I lose more than half of the amount of data. I don’t know where the problem is exactly? Maybe the MPQUIC protocol and Mininet’s way of simulating package loss and delay do not go well together. If yes, could you please direct me to the right software that can be more efficient than Mininet?

Answer 1 · 2022-02-28T10:24:24.000Z

What is your exact setup? How do you set up losses? What are the exact commands you are running and from where?

Answer 2 · 2022-02-28T11:29:18.000Z

Thank you for your reply, I really appreciate it.
The client-server code was written based on this Github project: https://github.com/prat-bphc52/VideoStreaming-MPTCP-MPQUIC, then I added loss/delay options to the Mininet environment using the following commands:
net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='10ms', max_queue_size=25)
net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='50ms', max_queue_size=100)
net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='10ms', max_queue_size=25)
net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='50ms', max_queue_size=100)

Answer 3 · 2022-02-28T12:27:54.000Z

You should not limit the max queue size using netem, it won't give you the results you expect (see Section 5.2.2 of my thesis available at https://qdeconinck.github.io/assets/thesis_deconinck.pdf). To model buffer sizes, instead rely on shaping (tbf or htb) or policing. You can have a look at https://github.com/qdeconinck/minitopo.

Answer 4 · 2022-02-28T14:30:29.000Z

But even when I delete the max queue size option I'm still getting the same behavior.

Answer 5 · 2022-03-01T08:03:40.000Z

What is the actual tc command generated by mininet?

Answer 6 · 2022-03-01T09:56:14.000Z

I'm not using the tc command. Instead, I'm using the mininet.link.TCIntf Class in a python script according to this article: http://mininet.org/api/classmininet_1_1link_1_1TCIntf.html. I tried in earlier times to add loss/delay options using commands and keep the python script defining the bandwidth only, but it gave me an error telling me that I have to declare these options inside the script.
net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='10ms')
net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='50ms')
net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='10ms')
net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='50ms')

Answer 7 · 2022-03-01T10:30:09.000Z

Yes, but it uses tc under the hood. In the mining setup, how are links configured (tc qdisc show when the network is ready to use)?

Answer 8 · 2022-03-01T11:11:10.000Z

qdisc htb 5: dev server-eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 direct_qlen 1000
qdisc netem 10: dev server-eth0 parent 5:1 limit 1000 delay 10.0ms loss 1%
qdisc htb 5: dev server-eth1 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 direct_qlen 1000
qdisc netem 10: dev server-eth1 parent 5:1 limit 1000 delay 50.0ms loss 1%

Answer 9 · 2022-03-01T12:18:45.000Z

And what about tc class show for htb?

It might also be useful to have an example of loss pattern you observe (e.g with either logs or pcap trace) vs. what you would expect.

Answer 10 · 2022-03-01T12:41:55.000Z

I'm a beginner at Mininet, so excuse me if I took so much time to understand. I ran the following command: tc class show, and it returns nothing...

Answer 11 · 2022-03-01T13:58:12.000Z

From what I understand, there is no bandwidth limitation being applied, which may give very strange behaviours. You should look at the exact command being generated (this (https://github.com/mininet/mininet/blob/270a6ba3335301f1e4757c5fb7ee64c1d3580bf2/mininet/link.py#L316) is the line you want to see the output, have a look in Mininet in how you can enable such logging) and paste the output here.

Answer 12 · 2022-03-02T11:58:17.000Z

I'm sorry I just know how to use the tc class command, so it returns:
class htb 5:1 root leaf 10: prio 0 rate 10000kbit ceil 10000kbit burst 15kb cburst 1600b

Answer 13 · 2022-03-02T14:20:36.000Z

Sounds ok then. To further understand what's going on, a PCAP trace along with the logs at both client/server sides could be nice to understand where packets losses occur.

Answer 14 · 2022-03-09T23:14:14.000Z

Hello,
Working by your suggestion I took a PCAP caption and got some strange behaviors. The first is the occurrence of more than one ARP packet. According to my knowledge, they can be seen at the beginning of the conversation when the MAC addresses must be discovered. The second is getting the "Destination Unreachable (Port Unreachable)" error from the ICMP packet.

Answer 15 · 2022-03-09T23:36:16.000Z

elhachi commented 3 years ago

Answer 16 · 2022-03-11T10:20:54.000Z

I found this note when reading the MultipathTester article: "We notably noticed some connectivity issues with QUIC using IPv6, but in America, we observed better performance using IPv6 rather than IPv4". Might it be the reason in my case also?

Answer 17 · 2022-03-11T14:02:40.000Z

ARP packet exchange at the beginning of the connection are normal if the ARP tables are initially empty.
With screen captures, I cannot say much more. The PCAP itself would have been better.
I though you were performing Mininet experiments, I'm not sure how the statement you mentioned relates to your current behaviour.

Answer 18 · 2022-03-12T22:19:12.000Z

The whole PCAP file:
mpquic_trace_client.zip

Answer 19 · 2022-03-13T11:54:45.000Z

I think this disconnectivity is caused by the huge amount of packets coming from the client-side (two sources). So the server was so busy to receive the entire data via one path. Notice that I create two paths for the server to use MPQUIC but it keeps working with only one!
Thank you for your time, I appreciate that.

Answer 20 · 2022-03-14T09:11:43.000Z

There are a few strange elements in your trace:

The connection seems twice idle for about 10 seconds. Is this expected?
How was the capture made? Between packet 553 and 554, the time rewinded (Mar 10, 2022 00:17:34.138101494 CET -> Mar 10, 2022 00:17:33.980234299 CET)
The ICMP messages seem to indicate that the socket of the path(s) at server side were meanwhile closed.
It would have been interesting to capture at both sides (i.e., client and server) to figure out which packets were actually lost. Furthermore, you should take a look at the implementation logs and try to understand what is going wrong.

Answer 21 · 2022-03-19T18:20:07.000Z

The only expected idle was located at the beginning of the connection because I had to enter some data manually and it takes a few seconds. Otherwise, I did not expect any idle moments.
Thank you so much for your help, I will try to do what you recommend.