keyboardio/Kaleidoscope

`arduino-cli` on Model 100 sometimes hangs attempting 1200bps touch

Opened this issue · 0 comments

tlyu commented

Describe the bug

I'm writing this up here to document it, because the GD32 repository doesn't have issues enabled. It's also more visible to other people who might be experiencing this problem.

On macOS (and possibly other OSes; not sure), sometimes arduino-cli hangs when attempting to do a 1200bps touch to reset the Model 100. This seems to be due to a bug in our GD32 USB core.

To Reproduce

Repeatedly attempt to use arduino-cli to upload, while monitoring custom trace logging of USB control requests. These control requests are the SetLineCoding requests sent by the OS when opening the serial port, possibly to set the baud rate. (I'm not sure why they're repeated so much and so quickly.) This request type has a 7-byte payload from the host. Observe that the control transfer handler doesn't correctly handle receiving SETUP packets in the middle of handling an ongoing control transfer, causing a hang of both the keyboard firmware and arduino-cli. (arduino-cli eventually times out.)

The hex strings are the contents of the Setup stage of the control requests, printed near the start of our control transfer handler. RW0 is the start of waitForReadComplete, and the single 1 digit is the boolean value returned from it. RCn is the number of bytes read by readControl. The SURPRISE SETUP is a special message from pollEPStatus when it detects this condition.

[14:39:14.694] 21 20 00 00 00 00 07 00
[14:39:14.697] RW0
[14:39:14.697] 1
[14:39:14.698] RC7
[14:39:14.701] 21 20 00 00 00 00 07 00
[14:39:14.701] RW0
[14:39:14.701] SURPRISE SETUP
[14:39:14.703] 1
[14:39:14.703] RC8

Basically, while waitForReadComplete is waiting for the Data OUT stage of a control transfer, another SETUP packet comes in, and our imperfectly-duplicated ISR fragment fails to handle it.

Expected behavior

arduino-cli should not hang when attempting a 1200bps touch reset. We also shouldn't hang when getting an unexpected SETUP packet in the middle of a control transfer.

Environment

  • OS: macOS
  • Version: 10.15.7 (19H2026)
  • Device: Keyboardio Model 100

Additional context

This bug is what led me to start re-implementing our control transfer handler to use a much of the low-level API as possible. Although it might theoretically be possible to keep adding hacks to pollEPStatus to handle this situation correctly, that code is already quite unwieldy and a minefield of race conditions. I think the answer is running less code, not more, from within the ISR context.

This is a trace from prototype firmware that is rewritten to use more of the low-level API to handle control transfers. The ctlOut n comes from the completion handler for the control Data OUT stage. The lines beginning with < are hex dumps of the data received from the host in the Data OUT stage.

[18:50:38.362] 21 20 00 00 00 00 07 00
[18:50:38.365] ctlOut 7
[18:50:38.366] <80 25 00 00 00 00 08
[18:50:38.371] 21 20 00 00 00 00 07 00
[18:50:38.371] 21 20 00 00 00 00 07 00
[18:50:38.373] ctlOut 7
[18:50:38.374] <80 25 00 00 00 00 08

Observe that there are two back-to-back control transfers in the middle of the log, and that the first one doesn't have a ctlOut line associated with it. The Data OUT completion handler didn't run, because the host aborted that transfer and sent another SETUP packet, which is correctly handled here.