hartkopp/can-isotp

wait_tx_done does not seem to be working [RPi + python-can-isotp]

dpatel20 opened this issue · 31 comments

I am enabling the wait_tx_done flag via python-can-isotp project and if I read back the socket opts it is showing as set:

<OPTS_GENERAL: optflag=0x404, frame_txtime=0x0, ext_address=0x0, txpad=0x0, rxpad=0xcc, rx_ext_address=0x0>

However, when I test sending a large message (>3kB), the send returns almost immediately (within ~2ms).

The kernel I have is: 5.13.0-1016-raspi

Is there a way to know if the can-isotp I have supports the wait_tx_done flag? And, if so, any idea why this might not be working for me?

I assume the receiving communication part is setting the blocksize to zero??
If so the CAN frames are dumped into the netdevice queue and the transmission is done (from the hosts standpoint).
If you set a blocksize ( 1 .. 15 ) or a STmin value the send() should not return immediately.

Perhaps I am misunderstanding how this works but from this issue, I had understood that if I send a large packet, the socket.send() would not return until all of the frames are sent. And that is regardless of STmin/blocksize (so if receiver doesn't request separation it will send as fast as possible but the send won't return?

I am sending >3kB which, from logs, takes maybe 200ms & nearly 500 frames, but the socket.send() returns within <2ms (I put debug prints either side of the send).

I tried reducing txqueuelen to 256 & even lower to ensure the send() couldn't dump all the frames in one go (not sure if this how it works). But same result.

And that is regardless of STmin/blocksize (so if receiver doesn't request separation it will send as fast as possible but the send won't return?

In this case the send() will return instantly as the ISO-TP state machine would dump the CAN frames to the netdevice and is done.

Please set up a receiving counterpart which sets the BS and/or STmin to see the effect. Also works on virtual CAN.

Terminal 1:
candump vcan0

Terminal 2:
isotprecv -l -s 123 -d 321 vcan0 -m 50

Terminal 3:
isotpsend -D 300 -s 321 -d 123 vcan0

I am definitely misunderstanding how this works then! :)

Could you please help me understand what happens differently when receiver has STmin>0? At which point does the send() return if sending a 3kB packet made up of ~500 frames when STmin>0?

I think what you're saying is if STmin>0 (or blocksize), then the send() will not return until every frame (~500 in example case) are sent. But if STmin & blocksize=0, then send will return immediately.

Is that correct understanding? If so, is there any possibility to have the send() not return until all frames are sent even if STmin/blocksize=0?

I think what you're saying is if STmin>0 (or blocksize), then the send() will not return until every frame (~500 in example case) are sent.

Not exactly. Please read what I wrote: "In this case the send() will return instantly as the ISO-TP state machine would dump the CAN frames to the netdevice and is done."

But if STmin & blocksize=0, then send will return immediately.

Same here. Read the line above.

Is that correct understanding? If so, is there any possibility to have the send() not return until all frames are sent even if STmin/blocksize=0?

No that's currently not possible. This would need a rework of the ISO-TP kernel driver to have a throttling for CAN interfaces with small tx queues. The ISO-TP is an unreliable datagram protocol. You are sending things. And wait for the answer with a timer on application level (e.g. for UDS). Why do you need the "end of transmission on the bus"? Just because there is a value in the spec - or do you have a real use-case?

Yes, this is a real world use case. It is pretty much the same scenario in the issue I linked to earlier.

I am doing data transfer (software update). The module in my case is sending STmin/blocksize=0 (so I can't change it). I need to check the timeouts and, right now, it is starting the timer in udsoncan almost immediately after send() is called. I was hoping that the timer could be started once the full set of data is sent like in the linked issue. Of course, I can increase the timeout but I was hoping for a more accurate timing so I could use the P2 timeouts reported by the server (udsoncan has an option of use_server_timing which is the time from when the module received the complete message to when to expect a response).

I see your problem to get more reliable timeout values. But currently the implementation does not provide this information - and due to the common concept of unreliable datagrams there is no strong requirement IMO. Btw. implement this throttling option would make it easier to make send() return as you suggested.
Till then I would suggest to calculate the time the frames needs on the bus for that specific bitrate and calculate a potential end of transmission time on this basis.

Thanks for the info.

Before I close this issue, could you explain why the timing issue was resolved in the earlier linked issue? i.e. with STmin=0x0A.

Before the change the user reported:

why the socket returns early and reports sending all the 4kb data, while the data is still buffering on the underlying layers, as seen on the isotp kernel module.

And after the change, they no longer have the timeout issue even though it takes 6s to send the data.

I'd like to understand how this is working in this use case (what is the difference to when STmin=0).

As a thought, is there a way to set transmit STmin regardless of value received from receiver?

I'd like to understand how this is working in this use case (what is the difference to when STmin=0).

STmin makes the isotp stack separate the CAN frame transmission.
In the example the STmin was set to 10ms - while the CAN frame usually last ~1ms on the bus.

As we need to make sure the separation time is guaranteed until the very last frame, the transmission is ending there.
That's why send() returns only when the PDU is done.

As a thought, is there a way to set transmit STmin regardless of value received from receiver?

Oh, good idea! Yes, there is an option frame_txtime in the socket options which is added as 'known CAN frame bus occupation time' to the STmin you get from the receiving node.

Additionally there is the option to force a STmin value which overwrites the value you get from the receiving node. Search for CAN_ISOTP_FORCE_TXSTMIN option and fill force_tx_stmin accordingly.

Both these additions cause the tx_gap value in the isotp.c to be not zero, which creates a separation time and disables the isotp_tx_burst functionality (search for it in isotp.c).

That should help ;-)

I have used the python-can-isotp library to test with and have set:

s.set_opts(txpad=0, optflag=0x400, tx_stmin=10000)

From the docs:

tx_stmin (int) – Sets the transmit separation time (time between consecutive frame) in nanoseconds. This value will override the value received through FlowControl frame. If not None, flags.FORCE_TXSTMIN will be set

But unfortunately it still didn't wait until the end of the full packet to be sent and I didn't see any change in behaviour. I then increased tx_stmin to 1000000 and that seemed to do the trick. Now, the send() returns after the complete is sent!

But I don't understand why I had to set that to 1000000 - any ideas? (Even 100000 was not enough).

And I guess the next issue is that I can't know that the receiver has set STmin=0 until I start sending. And I can't change the tx STmin once the socket is bound. Which means if a module responds with STmin>0 I will now be making it unnecessarily slower (STmin + frame_txtime ) or overwrite it with an incorrect value (if using force_tx_stmin) .

Hi @hartkopp,
I think we had this discussion in the past, but let's bring it back. I don't know much about the internals of a kernel driver, but is it possible to poll the driver to get the completion ratio of the transmission? I'm pretty sure that would be enough for the timer issues of @dpatel.

Peoples are harassing me as well for this feature. ISO-26262 has a requirement for a back-notification to the upper layer when completion is done.

@pylessard
I've been thinking about this and I think polling the driver won't work. Per @hartkopp comments above, can-isotp passes the data to the netdevice queue. At that point, can-isotp has no idea when the message leaves the queue and goes out onto the bus. I think the best we'd get back is that the messages have been placed into the queue (and that is what I understand wait_tx_done does). If the queue is large and empty enough, then it could be that the entire packet gets placed into the queue immediately and it returns straight away (assuming STmin=0).

And we can't wait for the queue to empty either because something else may be sending to the same device.

The only thing I could think of is if python-can-isotp could wait to see when the last frame appears on the bus. So, python-can allows receiving of own messages. My thought is that python-can-isotp checks received messages from the bus for the last frame it sent out. Once it sees the last frame, then it knows the entire message made it to the bus. Of course it could fail to send so they'd need to be some timeout or other failsafe. Not very efficient I suspect but it's all I could think of...what do you think?

PS: I am still curious as to why my force_tx_stmin didn't work until I increased the value to 1000000. @hartkopp Any insights into that would be appreciated. Thanks.

The only thing I could think of is if python-can-isotp could wait to see when the last frame appears on the bus. So, python-can allows receiving of own messages.

I'm currently thinking about this approach for the kernel implementation too.
So whenever I put a CAN frame into the queue, I could wait for this frame to be echo'ed back after it was sent out onto the CAN bus.

But this would have a heavy performance impact as we usually would queue up on the netdev queue and I have no clue about other outgoing traffic from that host.

And I would need to rethink the entire timeout mechanisms. Would you think waiting for every outgoing CAN frame to be sent would be fine from the performance view?

PS: I am still curious as to why my force_tx_stmin didn't work until I increased the value to 1000000. @hartkopp Any insights into that would be appreciated. Thanks.

No idea. 1000000 us == 1 sec == general ISO-TP timeout for lost connection. But I will look into it.

Would you think waiting for every outgoing CAN frame to be sent would be fine from the performance view?

Yes, because - at least from a UDS point of view - that is exactly what we need. i.e. do not return from send() until all packets are on the bus (or a failure occurred). Perhaps then it makes more sense to do this at the python-can-isotp level so user can choose. And, then all supported interfaces (Vector, Kvaser, etc) would benefit not just socketcan.

I guess there would be some performance issues to read back the messages coming from the bus but I think the situation is no different to receiving messages currently (e.g. when you send and then await a response).

@pylessard what do you think on this topic?

Hi @dpatel20 ,

can you build your own Linux kernel for a test?
Below patch (based on Linus' git tree 5.17-rc6) is a proof of concept which uses the local echo feature to block the sending of consecutive frames until the former CF has been sent on the bus.
I think the code needs some more review and maybe also some new error/timeout handling for this specific feature.

But it works for testing and should really return only when the last frame is sent out.

Updated (cleanup & fixed some stuff like timer handling) :

0001-can-isotp-add-local-echo-handling-for-consecutive-fr.patch.txt

Thanks for the patch. I haven't built a kernel before but might be able to give it a go!

Thanks for the patch. I haven't built a kernel before but might be able to give it a go!

Thanks! That would be really valuable as you have some intrinsic motivation for this feature to work :-D
I currently tested it on vcan0 only and it worked surprisingly well. Should be even better with a physical CAN interface.
Please let me know if you get into problems building your own kernel.

Other kernels (5.15 or something like that) should work too. If you have problems to apply the patch to an older kernel I can provide a backport for you.

@dpatel20 @pylessard
I posted a patch set on the Linux-CAN mailing list for the Linux mainline process here:

https://lore.kernel.org/linux-can/20220306193454.33158-1-socketcan@hartkopp.net/T/#m95217a28b4e0f208dac844d4f9b52fd808760263

The raw patch is here:
https://lore.kernel.org/linux-can/20220306193454.33158-1-socketcan@hartkopp.net/raw

I tested and beautified the code this weekend - and I'm pretty confident that it reliably does what we've been searching for.

If that works well, I will try to see if I can do something similar in my library and make sure the timeout handling is the same with the kernel module interface and the pure python implementation.

Good works!

Short update. I can't get the patch to apply (using https://lore.kernel.org/linux-can/20220306193454.33158-1-socketcan@hartkopp.net/raw).

Hunk #5 FAILED at 886.

I haven't had chance to look into why the patch fails at that point. I guess it's because of differing kernel version. I have 5.13.0-1017-raspi so I suspect the isotp.c file differs between mine and 5.17 that the patch is based on.

There is only one patch in the 5.13 kernel that should prevent the compilation:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3ae2365efc14269170a6326477e669332271ab3

I'm currently compiling a 5.13 kernel. Will upload a working isotp.c file for the 5.13 for you here in some minutes.

isotp.c.txt

Just remove the .txt extension. This isotp.c compiles with 5.13

The patch is now on its way upstream to the Linux mainline kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?h=testing&id=fb440a021a2e99ece37e6528391751dc51ae1f6e

I'm really confident and happy about this solution. I should have done this earlier ;-)
Thanks for pushing again and again :-D

That's great news!
I am still trying to build the kernel. I had it building last night but it failed overnight as I ran out of disk space on my Pi! Being the first time I've ever built a kernel, I didn't realise it used ~20GB disk space.

Should be finished re-building soon and then I can test. Hopefully my Pi boots at least once I switch to the kernel I built.

Just to check, the echo feature is on by default? So, no extra flags to set or is it tied to wait_tx_done?

Should be finished re-building soon and then I can test. Hopefully my Pi boots at least once I switch to the kernel I built.

Looking forward to your feedback!

Just to check, the echo feature is on by default? So, no extra flags to set or is it tied to wait_tx_done?

The wait_tx_done feature works like this:

  • You open a socket
  • You send a PDU with write()
  • depending on wait_tx_done
  1. the write() syscall returns immediately (and only blocks when you write a second PDU while the first is not completely transmitted)
  2. the write() syscall returns when the PDU is transmitted completely => wait_tx_done

There is no visible effect with isotpsend as closing the socket at the programs exit also blocks the close() syscall until the PDU is sent.

Not sure if I am reading your code correctly but is the following understanding correct: In a multi frame message, the next consecutive frame is not sent until the current one is seen on bus?

If so, then the one small concern I have is that this could be a bit slower to send a multi frame message. For example, we're sending a multi frame message and the same PC is also sending other CAN messages. can-isotp places one frame onto the netdevice queue and then waits for it to be seen on bus. In the meantime, some other messages are placed into the netdevice queue. Now can-isotp sees its frame on the bus and places next frame on to the netdevice queue. But that frame now has to wait for the rest of the frames in the queue to be sent.

If we only cared about the last frame of a multi frame being seen on bus, then can-isotp could place all the messages on to the queue as fast as possible (respecting STmin/block size) and there are potentially less 'other' frames in between. And only when the last frame is seen does it return.

Above is a typical use case for me...I send a number of cyclic frames to keep the module awake whilst also performing multi frame UDS (e.g. transfer data). So, two things are trying to place data onto the same netdevice queue.

Not sure, in reality, if this will make much difference but wanted to see if I understood the code correctly or not.

Not sure if I am reading your code correctly but is the following understanding correct: In a multi frame message, the next consecutive frame is not sent until the current one is seen on bus?

ACK

If so, then the one small concern I have is that this could be a bit slower to send a multi frame message.

Yes. It could be a bit slower.

For example, we're sending a multi frame message and the same PC is also sending other CAN messages. can-isotp places one frame onto the netdevice queue and then waits for it to be seen on bus. In the meantime, some other messages are placed into the netdevice queue. Now can-isotp sees its frame on the bus and places next frame on to the netdevice queue. But that frame now has to wait for the rest of the frames in the queue to be sent.

The interaction of different 'writers' on the same host can always lead to a problem with the netdevice queue. E.g. when you place a big chunk of isotp frames with no gap into the netdev queue the other senders will get out of 'in-time' transmissions.

There is a solution for this CAN-ID wise controlling of the outgoing traffic -> use queuing disciplines with the tc tool.

See concept: https://wiki.automotivelinux.org/_media/agl-distro/agl2017-socketcan-print.pdf (slide 41ff)
See details: http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf

But I'm pretty sure you won't need it by default.

If we only cared about the last frame of a multi frame being seen on bus, then can-isotp could place all the messages on to the queue as fast as possible (respecting STmin/block size) and there are potentially less 'other' frames in between. And only when the last frame is seen does it return.

Yes. I've been thinking about writing e.g. 8 frames as a set of bulk content into the netdev queue and then wait for the last frame to be transmitted. The point was that the code got really complex and the reliability of catching the right frame will decrease. I did multiple tests with real (USB) CAN interfaces here and the performance was still excellent.

IMO this comparably simple and dump transmission process does a good reliable job (much much better than the former unreliable dumping into the queue).

Above is a typical use case for me...I send a number of cyclic frames to keep the module awake whilst also performing multi frame UDS (e.g. transfer data). So, two things are trying to place data onto the same netdevice queue.

Yes. That's also my common use-case to create the 'tester present' with the CAN_BCM socket. The tc tool would be your friend if you really run into problems with this. But I really don't assume we will need to fiddle with tc for this kind of use-case.

Not sure, in reality, if this will make much difference but wanted to see if I understood the code correctly or not.

Yes. You understood the code correctly. Thanks for the review ;-)

Thanks for the explanation above. The queuing disciplines seems very interesting, so will take deeper dive into that.

And, I have now done my own testing and your new code works superbly. The send does not return until all the frames are sent so now my timing measurements are accurate. :)

Also, I haven't seen any issues with this in my testing.

Thank you for the excellent implementation.

(And bonus for me: I got to build a Linux kernel for the first time!)

(And bonus for me: I got to build a Linux kernel for the first time!)

Ah! Win-Win :-D

Thanks for the valuable feedback!

@hartkopp
How do I track which Linux kernel this gets merged into?

The changes are in the upstream process fir Linux 5.18.
They are currently sitting in the net-next development tree and will be pulled into Linus' tree the next days during the current merge window.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/net/can/isotp.c