candle-usb/candleLight_fw

Clean software reset canable

wristdirect opened this issue · 1 comments

In our system, we have three STM32F042C4 microcontrollers running canable. In our work, we often will trigger a software reset of the system (our main CPU is running Linux kernel version 4.9.133).

Every once in a while (the average failure rate is around 10%), one of the three chips will not get through the initialization process. This problem only seems to occur when we do a software reset. If you trigger a complete reset (by, say, doing a full power cycle), all three devices will always initialize properly.

Our hunch is that there are variables keeping their state in memory, and that after the reset, this dirty state is causing it to hang.

For example, when the request comes in to set the HOST_FORMAT to 0x0000BEEF, the canable device accepts the input happily. But, when the BITTIMING request comes in, which follows a very similar logic path, we (when the failure occurs) get nothing but not-ready NAK responses.

I've attached the portion of USB communication logs between the host and our canable device where most of the initialization is performed, and the trouble spot where we get endless NAKs. Please let me know if I can give any other useful information! I have limited ability to debug things further with the packet sniffer used to get these logs, but I do have the ability to iterate and change the firmware code itself.
USBLog_SoftRebootLostAddr3.txt

It turns out this issue was caused because on shutdown, we were not shutting down the CAN-side of the system. This led to undefined behavior when CAN messages continued to be sent to the canable after USB teardown.