serialport/serialport-rs

Last port.write does not send all bytes on FTDI dongle

brainstorm opened this issue ยท 7 comments

As discussed on matrix.org and documented here: https://github.com/brainstorm/serialport-rs-reproducer, thanks jr_fn and @sirhcel for following up with me over the channel.

Thank you very much for your awesome and comprehensive example! I finally can confirm seeing truncated data on the wire when using FTDI FT232R based cable (TTL-232R-RPI). And as you already reported, just adding a port.flush() to this example does not resolve the issue.

The writes in the example theorectically might get truncated and I've created brainstorm/serialport-rs-reproducer#1 for addressing this and adding a simple CLI for experimenting. Playing around a little bit gives me the following impression:

  • If/how much data is sent on the wire depends on the serial driver (FTDI versus SiLabs, I'm using the drivers shipped with macOS)
    • The FTDI driver seems to stop the transmission upon closing the port
      $ cargo run --release -- /dev/tty.usbserial-FTDI
      
    • Transmitting at a higher baud rate exposes more data on the wire
      $ cargo run --release -- --baud 57600 /dev/tty.usbserial-FTDI
      
    • The SiLabs driver seems to flush (drain in termios speek) the data upon closing the port
  • On Linux, all data is on the wire even with the FT232R device
  • Explicitly flushing data with Write::flush does not seem to happen macOS independently of the serial driver
    • Visible for the FTDI driver on macOS with
      $ cargo run --release -- --flush-before-drop /dev/tty.usbserial-FTDI
      
    • Flushing and sleeping to generate packet separators still results in a continuous burst of data on the wire when the separation is smaller than the transmission time of the packets, for example
      $ cargo run --release -- --packet-separation-ms=4 /dev/tty.usbserial-SILABS
      
    • Increasing the separator duration on macOS finally results in a separation of the packets on the wire but it looks like the given separation delay applies from packet start to packet start
  • Attempting to separate the packets on Linux works from the first millisecond
    • For example
      $ cargo run --release -- --packet-separation-ms=1 /dev/ttyUSB-FTDI
      

Do we have an issue with flushing in general on macOS?

A Python variant of this test (reproducer.py) exposes similar behavior on macOS: truncated data on the wire, no flushing and therefor delays for spacing packets apparently applied from start to start.

Hi @sirhcel, @brainstorm told me about this issue and I got nerd-sniped into having a look. ๐Ÿ˜†

I'm 99% sure this is either a MacOS FTDI VCP driver bug, or a silicon bug! It only seems to appear on macOS, as only the macOS driver seems to send a "Reset" command to the FTDI chip when it closes the port (as per screenshot on the repro repo).

Here's Linux (using in-kernel driver):
Screenshot_20230807_174534

(In fact, the Linux ftdi_sio.c driver doesn't have a custom close handler, it calls the generic usbserial close handler which only updates the control line status.)

Here's Windows (using whatever default drivers Windows installs):
Screenshot_20230807_180556

I also randomly bumped into a stackoverflow issue reporting the same behaviour on macOS, from 2014!

I don't think there's anything that a userspace library can do about this. :|.


If it's no longer a serialport-rs issue, I'd suggest the question becomes: should the FTDI "Reset" command clear the chip's TX FIFO? If yes, this is a bug in the macOS FTDI VCP driver and should be fixed by FTDI so it doesn't send this command when you close the serial device. If no, this is a bug in the FTDI clone chips and I suspect FTDI will not want to change the driver to accommodate. ๐Ÿ˜† .

FTDI don't document their USB protocol, so I guess the challenge is to reproduce on a known genuine FTDI chip. I'm not sure I have a 100% definitely genuine FT232 family chip at hand, but maybe someone does...

I'm 99% sure this is either a MacOS FTDI VCP driver bug, or a silicon bug!

After further looking into it and preparing the example flush-and-close.c for filing "feedback" to Apple, it looks to me that we are facing a more general issue here. Truncated or even garbled data shows up with pySerial and with directly using termios in flush-and-close.c. And with the larger output data from the latter, truncated data shows up with the SiLabs CP2102N dongle too:

And with the FTDI FT232R dongle, the output data is garbled and looks like written data did not got appended to the transmit buffer but somewhat overwrote existing data:

To me this looks now like we are facing an issue with Apple's termios implementation or serial driver framework. Cross checking with flush-and-close.c on Linux exposes the expected behavior:

It only seems to appear on macOS, as only the macOS driver seems to send a "Reset" command to the FTDI chip when it closes the port (as per screenshot on the repro repo).

Resetting the port or IC might still be fine in case that flushing/draining the data beforehand works.

I also randomly bumped into a stackoverflow issue reporting the same behaviour on macOS, from 2014!

Such reports lower my expectation that Apple is keen on diagnosing and fixing such issues. :-|

FTDI don't document their USB protocol, so I guess the challenge is to reproduce on a known genuine FTDI chip. I'm not sure I have a 100% definitely genuine FT232 family chip at hand, but maybe someone does...

I bought the serial hardware I'm using for these tests from DigiKey so I'm assuming that I'm testing on genuine hardware from these vendors.

If it's no longer a serialport-rs issue, I'd suggest the question becomes: should the FTDI "Reset" command clear the chip's TX FIFO? If yes, this is a bug in the macOS FTDI VCP driver and should be fixed by FTDI so it doesn't send this command when you close the serial device. If no, this is a bug in the FTDI clone chips and I suspect FTDI will not want to change the driver to accommodate. ๐Ÿ˜† .

It looks to me that we already left the scope of serialport-rs and the issues lies deeper down in the plumbing. On top of that it looks to me like neither FTDI nor SiLabs are providing drivers for Apple silicon as of now and the drivers shipped by Apple are the only ones available on this hardware so far.

Let's see if there will be a timely response from Apple regarding the "feedback" ...

That's extremely interesting, @sirhcel!

the output data is garbled and looks like written data did not got appended to the transmit buffer but somewhat overwrote existing data

๐Ÿ˜ฎ

should the FTDI "Reset" command clear the chip's TX FIFO?

It's a slightly moot point now, but I looked into this a little more. In the Linux driver there are some USB protocol details that credit FTDI for help with the protocol so they're probably correct. If I read the capture from @brainstorm's repo correctly then the host is sending individual Reset commands for "purge RX buffer", "purge TX buffer", and then "purge RX buffer" again! It seems from the description that the intent for these commands is to drop bytes in the buffers, not wait for them to be sent.

I don't see any FTDI commands for managing the TX buffer otherwise (waiting for it to be empty, or checking if it's empty). So once the data goes to the chip the only options might be to wait for the bytes to be sent on the wire, or reset the TX buffer. Of course, maybe there are some commands in the protocol not included in Linux.

the drivers shipped by Apple are the only ones available on this hardware so far.

Let's see if there will be a timely response from Apple regarding the "feedback" ...

That is very interesting! I'm also curious what you hear back.

That's extremely interesting, @sirhcel!

the output data is garbled and looks like written data did not got appended to the transmit buffer but somewhat overwrote existing data

This might be an effect from resetting the chip. But as we are talking about the land behind tcflush() and close(), I would not expect everything to be smooth and shiny there.

It's a slightly moot point now, but I looked into this a little more. In the Linux driver there are some USB protocol details that credit FTDI for help with the protocol so they're probably correct. If I read the capture from @brainstorm's repo correctly then the host is sending individual Reset commands for "purge RX buffer", "purge TX buffer", and then "purge RX buffer" again! It seems from the description that the intent for these commands is to drop bytes in the buffers, not wait for them to be sent.

Isn't this meant for tcflush() when one want's to discard data?

I don't see any FTDI commands for managing the TX buffer otherwise (waiting for it to be empty, or checking if it's empty). So once the data goes to the chip the only options might be to wait for the bytes to be sent on the wire, or reset the TX buffer. Of course, maybe there are some commands in the protocol not included in Linux.

Aren't we just interested in getting the data on the wire? So waiting for the data to be sent is enough in our case. And this apparently works well on Linux.

I had a dive through Linux serial driver stack today and found serial_wait_until_sent which is uses the driver ops wait_until_sent which looks like being backed by usb_serial_generic_wait_until_sent which in turn uses tx_empty from fdti_sio.c.

the drivers shipped by Apple are the only ones available on this hardware so far.
Let's see if there will be a timely response from Apple regarding the "feedback" ...

That is very interesting! I'm also curious what you hear back.

I finally got feedback from Apple. The engineer stated that once the data gets passed from the serial framework to the USB serial driver, its job is done. I'm reading this as that there are no means for checking whether all data has been written out or enforcing this.

I have no experience with writing device drivers for macOS so far. But in case IOUserUSBSerial is the interface to implement for an USB serial driver, I don't see any means for waiting that all data got on the wire or enforcing this so far.

I've got the recommendation that an application could wait before closing. But from a library perspective there would be a lot of fairy dust required to get an estimation for this time right ...

Thanks for sharing your experience asking Apple about this.

Isn't this meant for tcflush() when one want's to discard data?

Totally agree this seems to be the purpose of these. I also agree that this isn't what serialport-rs is asking for anywhere, but it seems to be happening anyway.

I had a dive through Linux serial driver stack today and found serial_wait_until_sent which is uses the driver ops wait_until_sent which looks like being backed by usb_serial_generic_wait_until_sent which in turn uses tx_empty from fdti_sio.c.

Oh, nice! This is exactly what I was looking for (and didn't find) when I last posted.

The engineer stated that once the data gets passed from the serial framework to the USB serial driver, its job is done.

I guess this comes down to the OS vendor's interpretation of where "the terminal" sits in this pipeline, so there's some wiggle room! It doesn't seem unreasonable for them to say this, though.

Although it does seem inconsistent with this philosophy if the driver later explicitly asks the USB device to discard some of that data from its FIFOs.

There are also maybe fiddly implications for functions like tcsettatr(.., TCSADRAIN, ..) - is there actually no way to reliably say ("wait for all bytes to have written out of the UART, then change the baud rate")?!?

I have no experience with writing device drivers for macOS so far. But in case IOUserUSBSerial is the interface to implement for an USB serial driver, I don't see any means for waiting that all data got on the wire or enforcing this so far.

I have absolutely no knowledge of macOS drivers either, but agree based on a quick look. I did wonder where other control requests (baud rate, etc.) are sent, this looks like IOUserSerial.

I suppose throwing AppleUSBFTDI.kext into something like Ghidra would confirm this, but regardless there doesn't seem to be any way in either driver interface to check if all bytes have been sent to the hardware.

I did note a way to reset the FIFOs (HWResetFifo). This seems like it maps to that tcflush()-equivalent functionality that was captured by @brainstorm happening automatically on close (I don't mean macOS is necessarily calling this driver function, but it might mean that).

I also wondered if there's was any non-BSD userland serial API that might change that default behaviour, but the Apple IOKit docs literally say to use the BSD APIs so I guess not.

I've got the recommendation that an application could wait before closing. But from a library perspective there would be a lot of fairy dust required to get an estimation for this time right ...

I'm mostly a bystander here, but FWIW I agree. It's fiddly at best, and impossible if hardware flow control is enabled.

EDIT: Added one more reply at the top.