4ms/metamodule

Wifi module hangs unit if browser refreshes during an update

Closed this issue · 4 comments

  • Power up with Wifi module connected
  • Connect via the browser to the wifi module and wait until it's done loading in the browser
  • Perform a software update.
  • Refresh the browser tab while it's doing the update (can be during the "Loading update files... please wait" message, or also while checking the checksum)
    Sometimes it happens when I don't refresh the browser, but it seems random when that happens. It never happens if wifi is not connected (via the cable) or if a browser window is not open.

The console fills up in an endless loop:

No flag
No flag
[repeat a few 100 times]
USART2: RX Soft Overrun
USART2: RX Soft Overrun
[repeat a few dozen times]
...

Sometimes it does this (repeated "No Flag" in the console, locking up the unit) when I insert an SD Card. It's rare and happens randomly it seems.

From the MP153 Ref Manual, an overrun (ORE) in the RX FIFO can cause the RXFNE interrupt to fire. I added a check for the ORE bit in the ISR handler, and then clearing it if found.
I also added a read to the UART RDR in case the interrupt is called but neither the RXFNE nor the ORE flags are set. This ensures the "No flag" error only happens once, not repeatedly.

From this it's clear that what's happening is the USART is receiving data faster than the M4 can read it, filling up the FIFO. Presumably because the processor is busy with updating. Also we get "Soft Overrun" errors, so the software FIFO is also filling up, but that can be handled in software.

The overrun conditions don't seem to cause any issues that I can tell except for the console log being filled up. So mabye it's safe to ignore them?
Otherwise I would suggest to disable the UART interrupts during updates and then enable them for the Wifi update portion only.

Also the fact that the error happens sometimes even when not updating indicates this full-FIFO situation can happen even outside of updating, so hopefully ignoring the errors and just clearing the flags is OK.

I pushed my changes to fix-wifi-hang-on-refresh-during-update. Let me know what you think

Thanks for looking into this. I was also thinking about interrupt flags, so great you nailed this down.

Generally the protocol on top need to be able to deal with dropped bytes so if the hang is fixed it should be ok for now.
As a sidenote: There is just a simple checksum in the form of a packet length which worked fine until now. This needs to be very reliable because the flatbuffers inside those don't have any integrity checks and the unpacking process will happily chase corrupted pointers/offsets in the package and potentially cause UB. I have this on the list for future wifi changes.

We are already stopping the wifi comm during wifi updates but that also needs to happen for the other update type.

There are two similar usart drivers in the wifi subsystem and probably both need this change

Ok I added it to both drivers.

So just TODO is to stop Wifi comm during all updates.