commaai/panda

Panda DFU recovery

Closed this issue · 3 comments

Panda (C3X) doesn't always enter DFU when running recover.py.

I tried rebooting and entering dfu manually. After that it would either work many times and or repeatedly not find dfu.

Steps to reproduce:

  1. Turn on device and connect via ssh.
  2. Run /data/openpilot/panda/board/recover.py
  3. Gets stuck forever. Pressing Ctrl+C gives:
comma@comma-609a2742:/data/openpilot/panda/board$ ./recover.py
scons: Entering directory `/data/openpilot/panda'
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
scons: `board' is up to date.
scons: done building targets.
^CTraceback (most recent call last):
  File "/data/openpilot/panda/board/./recover.py", line 13, in <module>
    for s in Panda.list():
             ^^^^^^^^^^^^
  File "/data/pythonpath/panda/python/__init__.py", line 399, in list
    ret += cls.spi_list()
           ^^^^^^^^^^^^^^
  File "/data/pythonpath/panda/python/__init__.py", line 423, in spi_list
    _, _, serial, _, _ = cls.spi_connect(None, ignore_version=True)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/pythonpath/panda/python/__init__.py", line 323, in spi_connect
    dat = handle.get_protocol_version()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/pythonpath/panda/python/spi.py", line 270, in get_protocol_version
    with self.dev.acquire() as spi:
  File "/usr/local/pyenv/versions/3.11.4/lib/python3.11/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/data/pythonpath/panda/python/spi.py", line 111, in acquire
    fcntl.flock(self._spidev, fcntl.LOCK_EX)
KeyboardInterrupt
^C

Are you connected to a PC? Might be a race condition with USB enumeration in the ST bootloader.

image

https://www.st.com/resource/en/application_note/an2606-stm32-microcontroller-system-memory-boot-mode-stmicroelectronics.pdf

This is on C3X. I have it connected to 12V for this test.

Seems to me this is an SPI communication issue:

  • I found that ./recover.py only gets stuck when pandad.py is active.
  • pandad.py errors out when that happens.
selfdrive/pandad/spi.cc:   1 / 0x0 / 7 / 12 / tx: 14141414141414
selfdrive/pandad/spi.cc: SPI: timed out waiting for ACK, waiting for 0x1f
selfdrive/pandad/spi.cc:   1 / 0x0 / 7 / 12 / tx: 14141414141414
selfdrive/pandad/spi.cc: SPI: timed out waiting for ACK, waiting for 0x1f
selfdrive/pandad/spi.cc:   1 / 0x0 / 7 / 12 / tx: 14141414141414
selfdrive/pandad/spi.cc: SPI: timed out waiting for ACK, waiting for 0x1f
selfdrive/pandad/spi.cc:   1 / 0x0 / 7 / 12 / tx: 14141414141414
selfdrive/pandad/spi.cc: SPI: timed out waiting for ACK, waiting for 0x1f
selfdrive/pandad/spi.cc:   1 / 0x0 / 7 / 12 / tx: 14141414141414
and so on...........
  • After killing panda.py, the ./recover.py will show 0 dfu devices, even after trying again
  • Only after relaunching pandad.py and then killing it, the ./recover.py will show list dfu device again.
  • Resetting panda via nrst pin also fixes "recover.py showing 0 dfu devices"

Looks like some clash when accessing SPI.

Oh yeah, this is expected since the ST bootloader state machine is super sensitive. recover.py doesn't take an exclusive lock over the SPI device, so pandad's comms still go through.