`arduino-cli` on Model 100 sometimes hangs attempting 1200bps touch
Opened this issue · 0 comments
Describe the bug
I'm writing this up here to document it, because the GD32 repository doesn't have issues enabled. It's also more visible to other people who might be experiencing this problem.
On macOS (and possibly other OSes; not sure), sometimes arduino-cli
hangs when attempting to do a 1200bps touch to reset the Model 100. This seems to be due to a bug in our GD32 USB core.
To Reproduce
Repeatedly attempt to use arduino-cli
to upload, while monitoring custom trace logging of USB control requests. These control requests are the SetLineCoding requests sent by the OS when opening the serial port, possibly to set the baud rate. (I'm not sure why they're repeated so much and so quickly.) This request type has a 7-byte payload from the host. Observe that the control transfer handler doesn't correctly handle receiving SETUP packets in the middle of handling an ongoing control transfer, causing a hang of both the keyboard firmware and arduino-cli
. (arduino-cli
eventually times out.)
The hex strings are the contents of the Setup stage of the control requests, printed near the start of our control transfer handler. RW0
is the start of waitForReadComplete
, and the single 1
digit is the boolean value returned from it. RC
n is the number of bytes read by readControl
. The SURPRISE SETUP
is a special message from pollEPStatus
when it detects this condition.
[14:39:14.694] 21 20 00 00 00 00 07 00
[14:39:14.697] RW0
[14:39:14.697] 1
[14:39:14.698] RC7
[14:39:14.701] 21 20 00 00 00 00 07 00
[14:39:14.701] RW0
[14:39:14.701] SURPRISE SETUP
[14:39:14.703] 1
[14:39:14.703] RC8
Basically, while waitForReadComplete
is waiting for the Data OUT stage of a control transfer, another SETUP packet comes in, and our imperfectly-duplicated ISR fragment fails to handle it.
Expected behavior
arduino-cli
should not hang when attempting a 1200bps touch reset. We also shouldn't hang when getting an unexpected SETUP packet in the middle of a control transfer.
Environment
- OS: macOS
- Version: 10.15.7 (19H2026)
- Device: Keyboardio Model 100
Additional context
This bug is what led me to start re-implementing our control transfer handler to use a much of the low-level API as possible. Although it might theoretically be possible to keep adding hacks to pollEPStatus
to handle this situation correctly, that code is already quite unwieldy and a minefield of race conditions. I think the answer is running less code, not more, from within the ISR context.
This is a trace from prototype firmware that is rewritten to use more of the low-level API to handle control transfers. The ctlOut
n comes from the completion handler for the control Data OUT stage. The lines beginning with <
are hex dumps of the data received from the host in the Data OUT stage.
[18:50:38.362] 21 20 00 00 00 00 07 00
[18:50:38.365] ctlOut 7
[18:50:38.366] <80 25 00 00 00 00 08
[18:50:38.371] 21 20 00 00 00 00 07 00
[18:50:38.371] 21 20 00 00 00 00 07 00
[18:50:38.373] ctlOut 7
[18:50:38.374] <80 25 00 00 00 00 08
Observe that there are two back-to-back control transfers in the middle of the log, and that the first one doesn't have a ctlOut
line associated with it. The Data OUT completion handler didn't run, because the host aborted that transfer and sent another SETUP packet, which is correctly handled here.