STM32 Controller does not abort receive request after timeout
JonTBeckett opened this issue · 4 comments
Describe the bug
I see that #160 is bringing the STM32 controllers into this repo. It appears the issue is in that portion primarily. If this portion isn't maintained by USBX, feel free to direct me to the appropriate maintainers.
- What target device are you using? STM32U585
- Which version of Eclipse ThreadX? 6.1.10
- What toolchain and environment? Keil MDK v5.38 with the ARM Clang v6.20.1
Given
- A timeout value has been set for read operations via
ux_device_class_cdc_acm_ioctl()
- At least one call to
ux_device_class_cdc_acm_read()
has timed out
When the host sends data to the device while the device is NOT pending in a call to ux_device_class_cdc_acm_read()
,
Then the data sent by the host is never delivered to the device application even on subsequent calls to ux_device_class_cdc_acm_read()
It appears that the STM32 controller is not aborting the transfer request in the USB peripheral HW, causing an unexpected interrupt. Eventually HAL_PCD_DataOutStageCallback()
in the ux_dcd_stm32_callback.c
file is called in the interrupt handler. This function sets the size of the incoming data in the appropriate UX_SLAVE_TRANSFER
structure and then the appropriate semaphore is “put”. The issue comes from the fact that on the subsequent call to ux_device_class_cdc_acm_read()
, the size in the UX_SLAVE_TRANSFER
structure is set to 0, clearing any record of data already in the buffer.
Using TraceX, I confirmed that the interrupt happens before the call to ux_device_class_cdc_acm_read()
by observing that there were no threads pending on the semaphore when the semaphore is “put”.
Locally, I was able to mitigate this issue in two ways:
- Setting the transfer request size to 0 after completing the read instead of before (still setting to 0 before transmit requests).
- This obviously doesn’t directly solve the issue as there is no guarantee that the buffer provided to the USB peripheral HW when the receive request was setup is still valid when the interrupt occurs.
- Checking the return value of the semaphore “get” operation in
_ux_dcd_stm32_transfer_request()
(found inux_dcd_stm32_transfer_request.c
) and calling_ux_dcd_stm32_transfer_abort()
on appropriate errors (namely whenTX_NO_INSTANCE
is returned.- My quick patch implementation of this still has a race condition between the interrupt and the aborting of the transfer request though so it is just a proof of concept at this point.
To Reproduce
I don't have a demo project to share but I believe that any project that uses the CDC ACM Device class on STM32 will demonstrate the issue.
The following steps provide a good starting point for forcing this issue to occur.
- DEVICE: Set the timeout value for reads to a low value (100ms is what I used) via
ux_device_class_cdc_acm_ioctl()
. - DEVICE: Perform at least one read via
ux_device_class_cdc_acm_read()
that times out BEFORE sending any data from the host to the device. - DEVICE: Wait a relatively long time before performing another read (I used 15 seconds for this time period to force the issue)
- HOST: Send any data to the device (ensuring it happens after the first timeout and during the length of time described in step 3)
- Note that the device application never receives the data even on subsequent calls to
ux_device_class_cdc_acm_read()
Expected behavior
The USB peripheral HW transfer request should be aborted so that the interrupt does not occur outside of reads.
Impact
This lowers my team's confidence in USBX. It is not a show stopper as the odds of it happening during normal operations are fairly low as long as the processing time between calls to ux_device_class_cdc_acm_read()
is small. We will be doing further stress testing to see how often we hit it in general operating conditions and may have to move away from USBX if it occurs frequently enough.
I see that the related PR has been closed without merging because the files will be delivered in a dedicated repo. Is there any info on which repo and where that will be?