NXP/isochron

Premature/Late Transmission

Closed this issue · 3 comments

Hey, I'm trying to use isochron between two Intel I210 NICs with a gPTP connection over ethernet and cannot seem to get it working. I have also configured the traffic controllers to be using taprio on both devices.

phc2sys and ptp4l are both running perfectly fine and the computers are synced.

For the sender side, I'm using the following command:
sudo isochron send -i enp5s0 -A 00:1b:21:ed:ae:2b -d 00:1b:21:ed:b0:ed -c 0.005 -s 64 -t 1 -X 20000 -Q -n 1000.

The receiving side is using the following command: sudo isochron rcv -i enp3s0 -d 00:1b:21:ed:b0:ed -t 1.

These following is producing the following on the server side:

...
isochron[1686843351.372797772]: local ptpmon            0 sysmon               53
isochron[1686843351.912979790]: local ptpmon            0 sysmon            -103
Premature transmission detected for seqid 1 scheduled for 1686843353.395000000: TX hwts 1686843353.390166134
Timed out waiting for TX timestamps, 1000 timestamps unacknowledged 
tx timestamp thread failed: Invalid argument
[1686843353.395000000] seqid 1 txtstamp 1686843353.390166134 swts 1686843353.390136616
[1686843353.400000000] seqid 2 txtstamp 0.000000000 swts 0.000000000
[1686843353.405000000] seqid 3 txtstamp 0.000000000 swts 0.000000000
[1686843353.410000000] seqid 4 txtstamp 0.000000000 swts 0.000000000
...

and the client side showing:

Discarding seqid 1
Discarding seqid 2
Discarding seqid 3
...

The only notable change I've noticed from playing around with the flags is that changing the cycle time to something large (e.x. 5000) would cause the program to detect a late transmission rather than a premature one. Everything else remains the same.

Any ideas of what may be causing this?

Let me know if there is any other pieces of information you may require, thanks :)

changing the cycle time to something large (e.x. 5000) would cause the program to detect a late transmission rather than a premature one

FYI, --cycle-time 5000 gets interpreted as 5000 nanoseconds, not 5000 seconds, so that's not what I would call "large".

The program is not wrong in saying that there is a premature packet transmission. The first packet is sent 4833866 ns earlier than expected, so this gets detected and the subsequent packets no longer get sent.

This is with taprio rather than with SO_TXTIME, so what probably happens is that isochron is misaligned with the hardware schedule. Check the diagrams in man isochron to see how and why it wakes up in advance of its actual deadline, for the sendmsg() system call.

The program assumes that when it passes a packet to the kernel, the taprio gate is currently closed and will open in the future, and it maximizes its advance time to have the most comfortable real time deadline. But if the sendmsg() takes place so far in advance of the time slot that the (previous) taprio traffic class gate is already/still open, then of course the packet will be transmitted right away. See how you can fix the alignment with the --base-time, --shift-time and --window-size arguments (unless you calculate an --advance-time manually).

Not specifying the --window-size is probably where the problem is, I suspect.

Another small comment: if the MAC addresses are the true MAC addresses of the NICs, then you don't need to specify them as command line options, the program will figure them out and pass them around using its management protocol (asuming you enable that protocol using the --client option in the sender).