Ublox module not respecting I2C setup times
nseidle opened this issue · 11 comments
I think this may be the main culprit behind issues #37 and #38 but it's still a bit murky.
The ZED-F9P seems to be violating I2C setup times.
How to reproduce: Use a fast microcontroller such as the Artemis/Apollo3 and run Example99 on it. Using an analyzer we see the following:
The logic analyzer sees this byte as 0x21 but the Artemis sees it as 0xA1. This bit flip causes the CRC to fail and the frame to be tossed. Not terrible unless we are really trying to achieve fast update rates.
In the above image we zoom in on this 0x21 byte. The time between data going low and clock going high is 62.5ns. This is very short for I2C. 62.5ns is the shortest time slot my analyzer can capture so I need to get a better analyzer.
According to the I2C spec
"[3] A device must internally provide a hold time of at least 300ns for the SDA signal to bridge the undefined region of the falling edge of SCL." The ZED-F9P is not doing this so there are random 0-bits that are being mis-interpreted.
The ATmega328 does not experience this because it is too slow. Below is the "how many bytes available" read on the ATmega328:
4.875uS. It is interesting to note it looks like the ZED-F9P is clock stretching.
And below is the equivalent read on the Artemis:
If I had to guess, the I2C hardware on the ZED-F9P is malfunctioning. It is releasing the SCL line before it has properly set SDA. Almost all the I2C byte transactions look clean on the Artemis. But after the ZED-F9P has clock stretched, it gets really close to a bit error.
Here's an example. In the middle of a 100 byte read from ZED to Artemis, the ZED-F9P clock stretches to tell the Artemis to hold:
52us is the clock stretch. Once the ZED-F9P releases the clock the Artemis immediately resumes clocking out the data. But the ZED-F9P has not set SDA with the proper setup time of at least 300ns. In this example Artemis correctly read the first bit as 0 but because the timing is violating the I2C spec Artemis cannot be guaranteed to interpret that first bit correctly.
TODO:
Test on Teensy. This will help determine if the problem is Artemis or ZED-F9P.Both platforms see similar timing violations of the I2C spec (62ns). Note: Teensy doesn't get CRC error where Artemis does intermittently.Test at 100kHz. All testing so far has been at 400kHz. Setup time should still be 300ns on SDA but it may be interesting.I2C timing violation appears at both 100kHz and 400kHz on both Teensy and Artemis.Test SAM-M8Q to see if similar timing errors exist.Timing violation is longer (187ns vs 62ns) but is still too short for the I2C spec (300ns).
SAM-M8Q shows similar timing issues but with more clock stretching.
Above is Artemis reading 100 bytes from SAM-M8Q. The SAM-M8Q stretches just a bit between each byte. I chaulk this up to the SAM-M8Q engine (ZOE?) is much less powerful than the ZED-F9P and has slower internal clock, less resources, etc, that cause it to be less capable of responding to constant I2C clocks. So it stretches.
In the above image the SAM-M8Q is stretching and has a SDA setup time of 187us. This is longer than ZED-F9P but still about half the required time by the I2C standard. So SAM-M8Q is also violating the I2C standard (I think).
Shown above, the Teensy 3.2 experiences similar 187ns SDA setup times on the SAM-M8Q. Note the Teensy is running the I2C clock at 375kHz, not full 400kHz as requested.
Shown above, Teensy 3.2 with the ZED-F9P. Setup time decreases to 62ns just like Artemis. Short setup time is seen directly after clock stretching, just like Artemis. It is worth noting that the Teensy does not get CRC errors. Meaning, I believe the Teensy I2C hardware is more fault tolerant than the Apollo3. Not saying it doesn't happen, I just haven't captured a CRC error on the Teensy yet.
All of these setup times (62ns vs 187ns, etc) are limits of my analyzer (integer values of 62.5ns / 16MHz analyzer). I'll get a faster one to try to dial in that setup time.
There was a similar issue in the nodeRED firmware, maybe this workaround can be applied.
nodemcu/nodemcu-firmware#1586
Based on the debugging below, I assume it's the same problem. I'm observing it on a STM32 board:
Sending: CLS:1 ID:7 Len: 0x0 Payload:
No bytes available
No bytes available
No bytes available
Bytes available:773
Size: 92 Received: CLS:1 ID:7 Len: 0x5C Payload: 28 7 [snipped payload] 23 0 0 0 0 0 0 0 0 0
CLS/ID match!
Sending: CLS:A ID:4 Len: 0x0 Payload:
Bytes available error:13666
Bytes available:13666
waitForResponse timeout
I see that "13666" value a lot. It seems magic, but on the other hand it could just be garbage bits from the stretch. Note it works absolutely perfectly on a Thing Plus.
Would this be an issue on a Teensy 4? I'm starting a project with it and debating whether to use I2c or just a UART to talk to the F9P.
I've tested the latest v1.7.1 of this lib with Teensy 4. Works great on Wire (SDA0/SCL0). I haven't seen a risetime or CRC issue after ~15 minutes of testing.
I have not tested with other I2C ports on Teensy 4.
I need to revise my previous post (I failed to run both at 400kHz): Teensy 4 is experiencing similar CRC problems as Artemis.
Below is Teensy 4.0 connected to ZED-F9P at 400kHz:
And below is Artemis connected to ZED-F9P at 400kHz:
Both have a too short setup time from the ublox, both experience CRC issues at 400kHz, and very few (1 every ~30s) at 100kHz. Note both of these tests didn't have any other I2C devices on the bus.
With a few changes, CRC can be reduced to a manageable level:
- Run the I2C clock at 100kHz if possible.
- Remove other devices from the I2C bus.
- Remove any stray pull up resistance. Limit the SDA/SCL pullups to 2.2k. On the Artemis, you can further control the I2C pull up resistors using the Wire.setPullups(0); //Disable pullups method.
- The race condition seems to be exasperated when the ublox needs to clock stretch. Disabling the communication on other ports (for example, turning off NMEA on UART1/2, etc) may decrease the need for the ublox module to clock stretch.
- The race condition seems to be exasperated when the ublox needs to clock stretch. Disconnecting the USB connection on some modules (ZED-F9P) seems to free up resources on the ublox module which greatly reduces the CRC errors.
Changing the I2C pull-ups can make a big difference to the bus errors.
If I disable the extra pull-up resistors on the ZED-F9P breakout by cutting the split pads and disable the internal pull-ups in the Artemis, then I see no bus errors.
If I set the Artemis pull-ups to 1.5K, I start to see bus errors.
My theory: the slower rise time on the clock leading edge helps improve (extend) the data set-up time.
Here's an oscilloscope trace of the clock and data lines with the extra pull-ups disabled:
Here's a trace with the Artemis pull-ups set to 1.5K:
Note the much faster clock rise times (as expected with the lower pull-up resistance). Also note the change in the clock period!
Food for thought. My advice: disabling any extra pull-ups is the way to go. The ones internal to the GNSS module do the job very nicely.
BTW: when I'm logging RAWX data, I have to disable the 7F check. RAWX data can legitimately contain 7F's in the data stream. Treating them as errors causes more problems. With pull-ups disabled and with the 7F check disabled too, I still see no bus / checksum errors at 400kHz.
I think we've solved most of this issue by removing the pull up resistors. If you're experiencing anything like this be sure to remove all pullups on peripheral boards and/or the controller. Most, if not all, u-blox modules have built in pull ups.
Notes added to readme.