FrameworkComputer/SoftwareFirmwareIssueTracker

USB-C Problems with USB-PD cycling every second

Opened this issue · 11 comments

Hi,

Equipment:

  1. FW16 laptop
  2. USB-C external NVME enclosure.
  3. Non-FW16 laptop.
  4. A usb-c cable with breakout.

The problem I see is that with the same NVME enclosure and USB-C, it works reliably on a Non-FW16 laptop. e.g. A HP laptop, but keeps cycling, about every 1 second, and never connects to the FW16 laptop.

I have done some analysis with USB-C sniffers/testers and a data capture oscilloscope.

Some details can be found here:
https://community.frame.work/t/framework-180w-adaptor-no-usb-pd-15v/70913/20

In summary, the SRC (FW laptop) sends a "Accept" on CC1 but the the SNK (device, NVME enclosure) never sends a GoodCRC back.

I have looked in detail at the content of the "Accept" message and compared the oscilloscope data captures between the FW16 laptop and the Non-FW laptop.

The CC messages themselves are all pretty much the same.
So, why is the SNK never replying to the "Accept".
After a lot of testing, I think the best theory I have is that the SNK (device,NVME enclosure) is not detecting the IDLE between CC messages, and thus never cuts in and tries to transmit the GoodCRC.

I have found one major difference between the two.

  1. FW16 laptop idles at about 0.88 Volts.
  2. Non-FW laptop idles at about 1.67 Volts.

The normal swing (peak-to-peak) of CC messages is between 0V and 1.2V.
The idle is supposed to be when the output is set to "high impedance".

As far as I can tell, the only thing different that can affect the idle volts is the value of "Rd" and "Rp" in the laptop.

Note: Another observation is that changing the USB-C cable changes the outcome. I.e. some cables work, some do not work on the FW16, but all the same cables work on a Non-FW laptop.

There are possibly other causes for the NVME not responding to "Accept", but I need more time to look into those. E.g. 1) As the idle volts is only 0.88V, the first bit of the 64bit preamble is not detected or an extra one is added so not 100% reliable. With idle volts at 1,67V, the 64bits are very reliably detected. But that is why one has the "Sync-1" tokens, so the receiver can get back into sync event if there are not exactly 64 bits of preamble. E.g. 2) The SNK (NVME device) does not have enough power to respond. Needs further investigation. The SNK has been able to send a "GoodCRC" and "Request" messages prior to receiving the SRC "Accept" messages.

The USB-PD R3.2 standard sections relating to "Idle" are:
5.7 Collision Avoidance
5.8.5.4 Inter-Frame Gap
5.8.6.1 Definition of Idle.

From those sections, an endpoint detects idle by detecting less than nTransitionCount transitions in time window tTransmissionWindow. Thus "idle" detection is due to a lack of transitions rather than any idle voltage level. So, I don't think the 1.67 vs 0.88 Volts is a problem unless the "high impedance" requirement is missing.

I am hoping that someone else, with more knowledge of USB-PD than me, can let me know if these differences are significant or not.

Image

Looking at the screenshot.
This is an oscilloscope trace showing the end of a USB-PD CC message, showing it entering the "idle" period.
The White trace is a FW16.
The blue trace is a non-FW laptop.

Notice that the non-FW laptop holds the level at 0V for longer, before it lets it return to its high impedance state.
The FW is holding it for about 1uS, whereas the non-FW laptop is holding it for about 13uS.
The max allowed hold time is 23uS, the minimum is 1uS.
It looks to me that the FW16 USB-PD is cutting it fine, and should probably hold it for longer.
Note. I think the line is only wobbly because it was done with an only 8bit oscilloscope at a relatively low sample rate. There are only about 5-6 samples per wave.
Notice also, that the FW16 high impedance level is below (0.88 V) the wave peaks (1.2 V), but the non-FW laptop high impedance level is way above (1.67V ) the wave peaks (1.2 V).

Here we have a clear and obvious bug:

  1. It looks like the main problem here is the Laptop(SRC) sends the USB-PD CC1 (BMC encoded) “ACCEPT” but never received a GOODCRC in return from the device(SNK).

I don't know what the cause of this is yet, it could be any off (guesses):

  1. Signal noise or distortion or wrong levels on the Laptop TX side so that when it arrives at the destination device, it sees CRC errors. Note: There is no reporting of bad CRC, so this will be hard to be sure about.
    A: From looking at the oscilloscope traces, their may be a levels problem. I.e. the Volt level during idle periods might be too low. There is no obvious noise or distortion problem otherwise.
  2. The Laptop TX side does not wait long enough between transmissions for the device to react.
    A: Minimum inter-frame-gap is 25 uS. The FW16 waited and retried "Accept" in 1099 uS. So, it waited long enough.
  3. The Laptop RX side is not sensitive enough and sees more CRC errors that other non-FW laptops.
    In summary, I need to find a way for the spy/monitor to report bad CRC errors and maybe hook up a digital oscilloscope to look for problems in the signal.
    The TX and the RX is done over the same cable it will be difficult to know which side is sending when on the oscilloscope display.
    A: From looking at the oscilloscope traces, this looks unlikely. The SNK (device) never actually sends the message to the SRC(laptop), so not likely a Laptop RX sensitivity problem.
  4. laptop not providing enough power to the device to complete the CC power negotiation process.
    A: Not investigated yet.
  5. The SNK (device) is simply not responding to the SRC (laptop) ACCEPT request.
    I have connected a data capture oscilloscope to the CC1 pin and then decoded the resulting CC messages. Checked all their CRCs and they are OK, but (5) is confirmed. the SNK is simply not responding to the SRC ACCEPT request as no pulses for it appear on the oscilloscope.
    A: This is clearly the current symptom / bug.
  6. The SNK (device) is not correctly detecting when the CC link is "idle", and thus not responding when it should. This needs further investigation. The "gap" between messages does appear to be different when comparing FW and non-FW laptops.
    A: This is the most likely cause currently.
  7. Observation, but no impact on this problem: Neither the FW or the Non-FW laptop send requests to find out which type of cable is attached.
    A: Not investigated yet.

Note: It is interesting to note that the "idle" voltage is not mentioned anywhere in the USB-PD standards document. It does not mention that the "idle" voltage clearly affects the detection of the first bit of the next message or not, and has no commentary on how the receiving device should react to this. So, if this "idle" voltage is the real problem, no USB test equipment would ever test for it!!!

Caveat: When I mention non-FW laptop, I have only tested with a single non-FW laptop. So I cannot have a view on whether all non-FW laptops behave the same as this single non-FW laptop.

Trying to understand why the "idle" voltage is 0.88V for the FW16, but 1.67V for the Non-FW laptop.

Image

That is "Figure 5.24 Transmitter Load Model for BMC Tx from a Sink" from the USB-PD R3.2 standard doc.

Here the left hand side is the Sink (Device) wishing to transmit to the SRC (Laptop) on the right hand side.
During idle in the FW16 case:
The Sink (Device) has Rd.
The SRC (FW16 Laptop) has Rp

During idle in the Non-FW laptop case:
The Sink (Device) has Rd.
The SRC (Non-FW Laptop) has Rp

The only difference therefore is Rp on the laptop side.
But, the Laptop does have both Rp and Rd. They are programmatically switched in and out of circuit depending on the Laptop role (Source or Sink).

So, could it be that the FW Laptop is accidentally leaving Rd in circuit during idle periods and when it is acting as a SRC, when it should not?

I would need the source code for the PD firmware to know for sure, but I don't have that.
I am only mentioning it, because I think it is the first thing that FW engineering should check.

Some maths to look into this further.
A simple two-resistor divider gives Vout = Vin · R₂ / (R₁ + R₂)
With the Vin = 5V.
For the non-FW laptop case, Vout = 1.67V
Result: R1 = 2 * R2.
So, say R1 = 20k, and R2 is 10k, we would get 1.67V out. So, here Rp = 20k, and Rd = 10k.
Note: Rp and Rd given random values here, the important part in these calculations is their ratio, not their actual values.
What about if Vout is 0.88 V.
Result: R₁ ≈ 4.68 × R₂
So, say R1 is still 20k, R2 is then needing to be about 4.2k.
If there were 2 Rd still in circuit, then 2 Rd in parallel =
1 / Req = 1/R2a + 1/R2b
1 / Req = 1/10k + 1/10k
Req = 5k.

So, with both Rp and Rd in circuit on the SRC, and Rd in circuit on the Sink.
The result is surprisingly close to what we are seeing in reality.
I.e. 5k is close to 4.2K.

Note: The USB Standard states that the Rd is 5.1k ± 5%
The Rp can be (programmatically set to one of): 56K ± 20%, 22k ±5%, or 10K ± 5%.

But, the analysis in this post is not correct. See following posts where the Rp is discussed ,where setting it to the 10K 3A value, instead of the FW set value of 22k, helps things considerably.

There is another USB standard document:
Title: “USB Type-C® Cable and Connector Specification Release 2.4”
DocName: “USB Type-C Spec R2.4 - October 2024.pdf”
https://www.usb.org/sites/default/files/USB%20Type-C%202.4%20Release%20202410.zip

“Table 4-36 Sink CC Pin Voltages for Connect and Current Advertisement Detection for Rd ± 10%”
Where CC Voltage ranges can be:
vRd-1.5; Min=0.746 V, Max=1.164 V
vRd-3.0: Min=1.369 V, Max=2.042 V

So, a CC voltage of 0.88V is reasonable, if the FW16 SRC is advertising 1.5A.
But the bit that is a little confusing is that in the associated CC message, the FW16 SRC advertised 3.0A
and SRC (FW16) “Accept” the Sink Device “Request” for 3.0A.
So, why is the CC voltage 0.88V. It should be in the vRd-3.0 range to match the CC message (maybe)

But none of this really explains why the Sink does not send a “GoodCRC” response back when the SRC sent the “Accept”.

I used the "pdwrite" command with some parameters (setting the 3A Rp resistor to 10k) and made some progress.
We actually get an "ACCEPT" then "GoodCRC" now, so that is progress.
It falls over just a little later though, with no "GoodCRC" to the "GET_SRC_CAP".
Up until the ACCEPT point the "idle" is at 1.56 V.
But just after the "GET_SRC_CAP", the "idle" has dropped down to 0.88 V again. The EC set the Rp resistor back to 22k.
So, I think that is pretty conclusive.
The 0.88 V is causing the problem with some devices.
Setting the 3A Rp resistor to 10k, and thus making the "idle" at 1.56V seems to be the solution to this USB Cycling problem.

Side note: some of the “hold time” , where the Cc output is held at 0V before returning to idle volts, was way below the 1uS minimum limit in the usb spec. That will need fixing also. I measured some at 0.1uS.

EVENT   173173  1       EVENT_ATTACHED  944
DEBUG   173173  1       VBUS:5165, CC:2 945
SRC     173233  1       SOP      PD3    s:006       H:0x11A1            (id:0, DR:DFP, PR:SRC)  SRC_CAPABILITIES        DATA: 2C910127
Option:         UNCHUNK DRD     USB     DRP
 [1] Fixed : 5V - 3A
        946
SRC     173235  1       SOP      PD3    s:006       H:0x11A1            (id:0, DR:DFP, PR:SRC)  SRC_CAPABILITIES        DATA: 2C910127
Option:         UNCHUNK DRD     USB     DRP
 [1] Fixed : 5V - 3A
        947
SRC     173237  1       SOP      PD3    s:006       H:0x11A1            (id:0, DR:DFP, PR:SRC)  SRC_CAPABILITIES        DATA: 2C910127
Option:         UNCHUNK DRD     USB     DRP
 [1] Fixed : 5V - 3A
        948
SRC     173416  1       SOP      PD3    s:006       H:0x13A1            (id:1, DR:DFP, PR:SRC)  SRC_CAPABILITIES        DATA: 2C910127
Option:         UNCHUNK DRD     USB     DRP
 [1] Fixed : 5V - 3A
        949
SNK     173417  1       SOP     s:002       H:0x0281     (id:1, DR:UFP, PR:SNK)         GOODCRC 950
SNK     173420  1       SOP      PD3    REQUEST s:006       H:0x1082            (id:0, DR:UFP, PR:SNK)  DATA: 2CB10412
ObjectPosition:1
GiveBack:0
CapabilityMismatch:0
USBCommunicationCapable:1
NoUSBSuspend:0
UnchunkedExtendedMessagesSupported:0
MaximumOperatingCurrent:3000mA
OperatingCurrent:3000mA 951
SRC     173420  1       SOP     s:002       H:0x0161     (id:0, DR:DFP, PR:SRC)         GOODCRC 952
SRC     173423  1       SOP      PD3    ACCEPT  s:002       H:0x05A3     (id:2, DR:DFP, PR:SRC)         953
SNK     173423  1       SOP     s:002       H:0x0481     (id:2, DR:UFP, PR:SNK)         GOODCRC 954
SRC     173459  1       SOP      PD3    PS_RDY  s:002       H:0x07A6     (id:3, DR:DFP, PR:SRC)         955
SNK     173460  1       SOP     s:002       H:0x0681     (id:3, DR:UFP, PR:SNK)         GOODCRC 956
SRC     173567  1       SOP      PD3    GET_SRC_CAP     s:002       H:0x09A7     (id:4, DR:DFP, PR:SRC)         957
SRC     173569  1       SOP      PD3    GET_SRC_CAP     s:002       H:0x09A7     (id:4, DR:DFP, PR:SRC)         958
SRC     173570  1       SOP      PD3    GET_SRC_CAP     s:002       H:0x09A7     (id:4, DR:DFP, PR:SRC)         959
SRC     173573  1       SOP      PD3    SOFT_RESET      s:002       H:0x01AD     (id:0, DR:DFP, PR:SRC)         960
SRC     173574  1       SOP      PD3    SOFT_RESET      s:002       H:0x01AD     (id:0, DR:DFP, PR:SRC)         961
SRC     173576  1       SOP      PD3    SOFT_RESET      s:002       H:0x01AD     (id:0, DR:DFP, PR:SRC)         962
DEBUG   173650  1       VBUS:990: CC:2  963
EVENT   173650  1       EVENT_DETACHED  964

Some points to note about the 22k vs 10k resistor and its potential affects on devices.

  1. 22k results in idle 0.88V. Idle is the voltage between CC messages.
  2. 10k results in idle 1.6V. Idle is the voltage between CC messages.
  3. the CC messages swing between 0 and 1.2V
  4. Current = V / (Rp + Rd)
    5V / (22k + 5.1k) = 0.184mA
    5V / (10k + 5.1k) = 0.331mA
  5. Current affects noise immunity. More current == better noise immunity. So the 10k case is about twice as noise immune as the 22k case. Note: lot of other factors also affect noise immunity, this is just one of them.
  6. Using idle of 0.88V makes the detection of the first bit of the CC message difficult to get right. It sometimes adds an extra bit, sometimes drops the first bit.
  7. Every non-FW laptop I tested with uses the 1.6V idle option. I have only found the FW16 laptop using the 0.88V idle option.
  8. Every USB device I have negotiates the power using the CC messages. I don't have any devices that only used the 22k/10k to decide how much power to draw.
  9. Only USB-C has CC messaging, so the 22k/10k only affects things one plugs into USB-C ports. Those are all normally modern components and thus unlikely to only use the 22k/10k to decide on current draw.
  10. The CC message detection logic in USB-C chips is normally done in hardware so firmware updates are unlikely to make them better at detecting CC messages. There are a lot of problematic USB-C chips out there.
  11. When problem cycling devices are instead plugged into USB-A ports, they have worked OK. My theory on that, is that USB-A does not have CC messages, and thus the Rp resistor is not involved at all with USB-A, so those obviously will work. E.g. the problem iPhone noted in other threads. USB-A uses different pins for negotiation messages and also different modulation.
    USB-A uses: Frequency Shift Key (FSK) modulation coupled onto the VBUS wire.
    USB-C uses: Bi-phase mark code (BMC) encoding onto the CC1/CC2 wire.
  12. I don't exactly know why the USB-C standard chose BMC. FSK is much better at noise immunity than BMC. It is interesting to note the USB-C supports both. It says all devices must do BMC and FSK is optional, but it might be worth the FW16 falling back to FSK if the BMC negotiation fails.

Summary:
My advice to FW, for design future main-boards, is to design them so that they can all use the 10k Rp and have associated over current protection so if devices draw too much, it will not damage the main-board. This will make the future FW main-boards have better compatibility with a wider range of USB-C devices.

As a work around, but just for me, I will be adding an EC command, so that I can change the Rp on one of the USB-C ports to output idle 1.6V. So that I can connect these problematic cycling USB-C devices to that one port and at least they will connect and not cycle.

These are actually pretty interesting findings, I wonder if this may be related to other problems we observe, eg Thunderbold link speed negotiation (for which I also filed an issue). Could you share the pdwrite commands you used? I'll be back in my lab tomorrow and could try out whether it makes a difference or not.

I believe I've had this issue with one of my older cables, blamed it on the copper work hardening (which could contribute probably) but I think switching out the charger finally solved it for me. On the AI300 board.

Could you share the pdwrite commands you used?

I don't think they would be very useful. If you read my comments on the pdwrite, all it did was get the "GoodCRC" from the "Accept" message, but then the idle volts dropped again, essentially undoing the "pdwrite" command. Resulting in it still not completing the negotiation process successfully. All it showed, was that when idle set to 1.6V, it gets further through the negotiation sequence.
To force it to 1.6V for the entire negotiation, would need EC firmware changes, which I have not written yet, so you cannot try them.

I have tested with 3 non-FW laptops, they all present Idle at 1.6V (3A Rp 10k), and do not idle at 0.88V (1.5A Rp 22k)
They also all hold 0V, at the end of a message, for longer than the FW before returning to Idle volts.

Apparently I have the same issue with my Framework 13 Ultra 7155H. When connecting my Thunderbolt 3 eGPU - Razer Core X Chroma to any other laptop it powers on, but on Framework it just powercycles on and off on both laptop and Razer eGPU. When turning on laptop the Razer eGPU never turns on. (Only got it to turn on once).

I think from reading this whole thread and this thread https://community.frame.work/t/usb-c-problems-with-usb-pd-cycling-every-second-ang/71349/7

That I have the exact same issue, I have a total of 19 other laptops with varying USB-C connectors all work without USB-PD cycling.