justincpresley/ndn-python-svs

Delay spike after ~90 seconds

Closed this issue · 4 comments

Hello Justin,
I am using the chat app to transfer bytes between two devices and I noticed that there is a delay spike during the data transfer process after about 90 seconds, it seems like there is a lag or congestion causing this. The delta t between received packets at the receiver is plotted below (in milliseconds) and it shows a spike then a dip. Any idea what might be causing that?
Screenshot from 2022-04-10 02-43-59

Hello AE,
This is very interesting. Observant, I like it.

I have a few ideas of what it could be.

  • It could be the sliding window if you are publishing faster than one is retrieving.
  • It could simply be delay from the Forwarder.
  • It also could be that there are too many async tasks which causes congestion / lag of getting to the task of retrieving the publication.
  • It could be a bug.

I wonder if this spike/drop is cyclic or reoccurs again given a long-running SVS application...

Kind regards,
Justin

Hello Justin, thanks for the insights.
I was using a PC and a Raspberry Pi in the initial setup then I switched to test with the PC only (to roll out HW issues) using localhost loopback but still seeing the same behavior.
image

This test was done by generating random 1248 bytes, publishing, wait for 30 milliseconds and repeat. The receiver has only 1 ms delay so publishing isn't faster in this case. In addition, it seems to be cyclic. I ran this for 15 minutes and there were 2 obvious spikes/dips, so it's seems to be a pattern and it happens every 5 mins. Another observation is the pattern of smaller spikes/dips between 40 ms to ~23 ms, which is consistent throughout the data.

Do you recommend any parameters to be tweaked as a work around at least for the high spikes?

This is also very possible:

It also could be that there are too many async tasks which causes congestion / lag of getting to the task of retrieving the publication.

Best,
AE

Hi AE, I appreciate your further experiments. I will note this issue, and look deeper to find just what might be causing this.

Do you recommend any parameters to be tweaked as a work around at least for the high spikes?

For a quick fix, you could try publishing data every...26 ms instead of 30 ms.

You might be wondering why a faster publication time would help with this problem.
The answer is founded on how SVS's protocol is defined. Currently, all nodes will "sync" their updates every 30 ms if no update is produced.

This means that if you publish every 30 ms. If the publication is received by just 1 ms longer (due to delay or etc), a sync update will be produced on the receiver side. This sync update is a task itself and could delay the receiver from retrieving the publication.
If the spike still occurs despite this change, than I do not have a solution for you at the moment, and in fact would make me believe it is less likely an async task issue entirely.

Kind regards,
Justin

Okay thank you. I will give it a try later. Closing this ticket for now.