hartkopp/can-isotp

missing feedback on expired timer for first flow control (FC) frame

Closed this issue · 2 comments

While using an UDS client - referred to as client - compiled from the library iso14229 for an embedded linux system I ran into the following situation:

  • After sending an UDS request to an UDS server - referred to as server - that would generate some workload, thus, stopping the CAN/ISOTP communication until end of calculation, another request is sent. This request is sent as a multiframe packet.

  • Due to the workload, the server is not able to send the first FC frame within the specified N_Bs timer (see Table 22 in ISO 15765-2:2016), which is 1 second long. This is set in:

hrtimer_sec = 1;

  • Because of this timeout, which is a known error to the kernel module, the data transfer is actually not successful. Nevertheless, on the client side, the write(fd, buf, count) call on the ISOTP socket file descriptor returns the correct number of bytes (>0) written AS IF the data was successfully transferred. This error should be propagated to upper layers so that on the UDS layer a client can get the feedback that a timeout on the transport layer occurred.

  • After checking that the payload has to be segmented and the First Frame is created and sent, the kernel module "waits for complete transmission of current pdu" at:

wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE);

  • When the timer for the first FC frame expires then this code is reached:

case ISOTP_WAIT_FIRST_FC:

and before waking up from the waiting there ist just a debug message and the tx state of the socket is set to IDLE:

so->tx.state = ISOTP_IDLE;

I think before "waking up" sk->sk_err should be set to an error like for example ETIME like it is set in other cases:

e.g.

sk->sk_err = ENETDOWN;

or

sk->sk_err = ENODEV;

so that there is some feedback to the user on the error. Otherwise, after "waking up", the kernel module cannot recognize this timeout error when making this check:

if (sk->sk_err)

and returns size as if the bytes were correctly written.

Request: While ignoring the case that the UDS server should not stop the CAN/ISOTP communication because of some workload - which could e.g. be delegated to a worker thread - I would like to suggest that the kernel module propagates this error, since it CAN be that in an application, an ISOTP receiver does not answer with a FC frame to a multiframe transmission within the right timeout.

What do you think @hartkopp ? The developers which use the open source UDS library iso14229 in combination with the ISOTP kernel module would benefit from this improvement, because an UDS client would be able to get feedback on this transport layer error. This way, a client will not be waiting for a response that will never come since the request was not sent - but it looks like it was sent.

Hi @muehlke ,
thanks for your detailed request!
Are you working with the latest mainline kernel module or with the code from this GitHub repo?
The Linux kernel implementation is the lead development and this repo is intended for really old kernels.
And there is also a branch for 'more recent' kernels before 5.10 here: https://github.com/hartkopp/can-isotp/tree/mainline-5.4%2B

Second question: Did you try the CAN_ISOTP_WAIT_TX_DONE option?

Hi @hartkopp ,

I'm not working with the latest mainline kernel module, but also not with the code from this Github repo. I'm using the mainline kernel module from 5.15.149-mainline, to be more specific at commit 458ce51d0356ee60c93f9f807d9827cf2a41643d from the Linux Kernel - stable repo.

I was definitely on the wrong "debug" track while, as you mentioned, looking through the code of this "old" repo. After checking the source code, from which our ISOTP kernel module is built and testing out the option CAN_ISOTP_WAIT_TX_DONE, I realized that this was the mistake, I was not setting this option so there was no feedback on whether the bytes were sent or the Flow Control frame did not arrive in time.

Thank you so much for pointing this out! I will consider the problem as understood and therefore the issue as closed.