8710A Reset Design Flaw

Question

8710A Reset Design Flaw

Closed this issue 3 years ago · 35 comments

There is an issue in the way the reset circuity on the BBB resets the 8710A ethernet PHY.

When coming out of reset, the 8710A can end up in an indefinite state as a result. This is problematic because you can loose the ability to communicate with the BBB occasionally and it requires a physical reset to clear the problem. For a remote system if a difficult to reach location this creates a major problem.

We have found that you can hack the current pcb (Rev B or Rev C) and disconnect the reset line from the 8710A. Then, hack a wire from the 8710A reset pin to a GPIO on the BBB. Now, when the ethernet PHY hangs it can be reset by asserting the GPIO pin.

I would like to recommend a revision to the PCB to address this issue. At the moment the reset trace near the 8710A is on an inner layer and not easily accessible. It would be a big help if the trace was more easily accessible. It would be an even bigger help if there was a jumper and a couple pads.

We would be happy to contribute the firmware driver we developed for this.

Answer 1 · 2018-03-20T14:06:49.000Z

This was the exact same fix done to later boards, on mainline, the kernel mdio device tree supports the concept of a gpio-reset line:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/devicetree/bindings/net/mdio.txt?id=69226896ad636b94f6d2e55d75ff21a29c4de83b

If we ever did a rev D pcb, this is something i'd request..

Regards,

Answer 2 · 2018-03-20T16:09:01.000Z

@RobertCNelson

Do you know if anyone has converted the schematic and pcb from Allegro format to either Kicad or Altium?

Also, on my wish list would be to increase the amount of RAM.

Answer 3 · 2019-02-13T00:10:51.000Z

There might be a fix here: https://wp.josh.com/2018/06/04/a-software-only-solution-to-the-vexing-beagle-bone-black-phy-issue/

Answer 4 · 2019-02-13T00:14:46.000Z

On Rev D, would it be more appropriate to use a reset controller rather than an RC circuit?

Answer 5 · 2019-02-13T03:01:29.000Z

@rowsail we have been using a kernel mdio address fixup patch mentioned in that link you show. A dedicated gpio to reset the phy is the best option.

Regards,

Answer 6 · 2019-02-15T02:14:13.000Z

So until a design with this fix is released (I have the design in Altium BTW) would an external reset controller on a cape (or equivalent) fix the issue?

Answer 7 · 2019-02-15T02:25:23.000Z

@rowsail, in your design based off the beagle, cut the reset line so they aren't shared, wire a spare gpio to it with a pull-up/pull-down (haven't looked at the phy reset logic in awhile) and use the phy-reset binding, to control the gpio.

Regards,

Answer 8 · 2019-02-15T02:36:23.000Z

Thanks for your help Robert. That's a very short trace on Pin 19 before it gets to the via carrying the regular reset signal, but if this goes into production, it's not really the sort of thing I want to do on 100+ units! It's difficult to find out what the actual issue is from the posts re the fix: is it that the reset duration is not long enough, or something else? If the former, I can fix it simply with a reset controller on my main board that can drive the reset low for longer and when the voltage is at a sufficiently high level. If it is a problem with the PHY (a silicon issue) then sure, the only fix might be to additionally control its reset line by GPIO.

Answer 9 · 2019-02-15T02:40:05.000Z

@rowsail , i haven't looked at the actual timing signals, but from what i've been told, the sys_resetn is long enough to reset the am335x but not long enough to correctly reset the phy in 100% of all boards.

Regards,

Answer 10 · 2019-02-15T02:43:57.000Z

OK - great info - thank you. I will check the timing requirements but if that is the case it sounds like adding an external reset controller should work.

Answer 11 · 2019-02-17T23:30:05.000Z

@rowsail. It is a timing issue and power sequencing issue ... not just duration of the reset. So, unfortunately, simply adding an external reset controller will not do the job.

It is an expensive hack. To do it reliably you need a laser to cut the reset line.

Can you convert the design files to Kicad? If so, I can propose some changes.

Answer 12 · 2019-02-17T23:37:21.000Z

Ugh, that's bad news. I've got the schematics updated to rev C in Altium, but PCB is still at rev A5. The additional components need to be placed and routed to bring to rev C, but for what I'm needing the Altoid tin profile is not necessary at all and just pushed up the assembly cost. If you let me know know the changes I will include and can send the update to you. Thanks, David.

…

On February 17, 2019 6:30:07 PM EST, sslupsky ***@***.***> wrote: @rowsail. It is a timing issue ... not just duration of the reset. So, unfortunately, simply adding an external reset controller will not do the job. It is an expensive hack. To do it reliably you need a laser to cut the reset line. Can you convert the design files to Kicad? If so, I can propose some changes. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #4 (comment)

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Answer 13 · 2019-02-17T23:51:45.000Z

I updated my comment to be more specific. The main cause is the power sequencing. As I recall (it has been a while), I think I concluded the 8710a is likely on the wrong power bus. In theory, separating the reset and using a reset controller for the 8710a should work. But the underlying issue remains if you do not address the power sequencing. So, likely the issue is best resolved by addressing the power sequencing issue and separating the reset.

Answer 14 · 2019-02-18T01:30:03.000Z

Attached should be a PDF of the schematic. I have attempted to create a structured schematic which I think is easier to understand to someone looking at it anew (i.e. me!). I have also made the changes which bought it to Rev C and also added a link between a GPIO pin and the reset of the PHY. I'd be grateful for any input good or bad.

BEAGLEBONEBLACK.pdf

Answer 15 · 2019-02-19T22:46:38.000Z

https://github.com/rowsail/-BeagleBone-Black

Answer 16 · 2019-06-03T16:24:51.000Z

I am designing a board based on BeagleCore and would add an ethernet PHY similar to this one of the BBB.

Is the solution of Rowsail correct ? With the fix in the mdio_driver ? (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/Documentation/devicetree/bindings/net/mdio.txt?id=69226896ad636b94f6d2e55d75ff21a29c4de83b)

Answer 17 · 2020-09-04T02:19:46.000Z

@RobertCNelson is there a safe gpio line we can use for the MDIO reset? We will eventually need to address the microSD card cage issue and therefore do a hardware rev.

Answer 18 · 2020-09-04T02:24:35.000Z

I think the reasonable way to do this (for as much compatibility as possible) is to find an unused GPIO that is able to float high with a pull-up and add an AND gate ahead of the reset to the PHY MDIO reset. That way, old code will still rely on the SYS_RESETn, but new code can reset the PHY even after SYS_RESETn is high.

Fixing the power sequencing would be nice such that this device is powered early enough for the SYS_RESETn to do the trick, but I'm worried about the validation process (risk) for that.

Answer 19 · 2020-09-04T21:26:47.000Z

I took a look at the technical reference manual and found this:

Caution must be used when implementing the nRESETIN_OUT as an bi-directional reset signal. Because of the short maximum time allowed using RSTTIME1, it does not supply an adequate debounce time for an external push button circuit. The processor could potentially start running while external components are still in reset. It is recommended that this signal be used as input only (do not connect to other devices as a reset) to implement a push button reset circuit to the AM335x, or an output only to be able to reset other devices after an AM335x reset completes.

It appears the BBB does not adhere to this caution. Moreover, the large capacitance on this signal line may cause issues due to the very slow signal rise time. It would likely be a good idea to properly debounce the reset button and remove the capacitor from this reset signal line.

I recall that when we did some work on this a few years ago we tested whether or not a reset controller would resolve the issue. If my recollection is accurate, what was reported to me was that a reset controller did not resolve the issue. Unfortunately I did not perform the testing so I am not certain of how the tests were carried out. I think there are reports from others elsewhere that attempted to use a reset controller and observed the same outcome.

I took a closer look today on the power sequencing and the TPS65217. The PMIC's default sequence has about 26ms of delay from the rise of LDO4. The LAN8710a datasheet indicates a minimum reset of 25ms is required. So, that delay plus the delay associated with the time constant should be adequate to reset the PHY.

Another thought is to increase PGDLY. It can be changed from 25ms to 100, 200 or 400ms.

Answer 20 · 2020-12-15T18:04:08.000Z

Fix in version C3.

Answer 21 · 2020-12-15T22:20:00.000Z

That looks like a lot of capacitance. Might want to double check the rise time spec on the 8710A.

Did anyone try changing PGDLY?

Answer 22 · 2020-12-16T22:20:03.000Z

Where is PGDLY set?

Answer 23 · 2020-12-16T22:56:30.000Z

I'm pretty sure PGDLY is in the TPS65217C, not sure we can reprogram that..

Answer 24 · 2020-12-17T02:23:47.000Z

I'm pretty sure PGDLY is in the TPS65217C, not sure we can reprogram that..

Yeah, it is a TPS65217C register. I looked into this and I think it can be changed but I was not able to determine which driver to make the change to.

Answer 25 · 2021-03-30T06:50:04.000Z

There might be a fix here: https://wp.josh.com/2018/06/04/a-software-only-solution-to-the-vexing-beagle-bone-black-phy-issue/

This is not a good fix, it causes most processor supplies to briefly power off while SYS_5V remains powered hence the VDD_3V3B regulator bug will cause the 3.3V supply on the P9 header to remain powered. Depending on what's connected to the beaglebone externally, this can easily cause external hardware to fry the AM335x's I/O.

I'm pretty sure PGDLY is in the TPS65217C, not sure we can reprogram that..

Yeah, it is a TPS65217C register. I looked into this and I think it can be changed but I was not able to determine which driver to make the change to.

Changing its value at runtime is pointless since that's too late, you'd need to change its non-volatile programming. This is presumably possible using the programming sequence documented for the TPS652170 (section 7.6.1.1), however this requires putting 8V on PWR_EN and therefore cannot be done in-situ.

As I also commented on that article linked above, my own testing results show that the primary cause of the phy problems is not the reset time but the rise time of the reset signal:

I’ve seen the inverted link led thing as well. It suggests the logic level of the led pin was somehow incorrectly recorded at reset, which is also the strapping option for REGOFF, hence the phy will not work in that case. In general, all of the phy problems (ranging from having an incorrect phy address to not working at all) appear to be due to incorrect strapping options being latched at reset.

Based on testing I’ve done the primary cause seems to be the slow rise of the reset line, which is caused by a 2.2μF capacitor on it (C24), apparently to ensure the phy’s specified reset timing is met, and to lesser extent by a 0.1μF capacitor (C30).

I’ve done some tests on a beaglebone (known to be susceptible to the phy issue) with a reset extender added to ensure reset timing is met and additional pull-up to decrease the rise time on reset deassertion. The impact on the phy failure rate was pretty clear:

2.4% (34/1431) with no external pull-up (just the on-board 10K).
1.0% (12/1189) with 1K pull-up.
0.4% (5/1153) with 240Ω pull-up.
0.15% (2/1354) with 1K pull-up and C24 removed.
0 failures in 16901 power cycles with both caps (C24 and C30) removed.

In other words, the faster the reset rise time, the less frequently it failed.

How or why the phy is managing to misread the strapping options is still a mystery to me. We tried shorting the link led to make REGOFF pulled down more convincingly and reduce the opportunity for noise pickup, but it did absolutely nothing. Adding 0.25s delay between bootrom and U-Boot SPL, just in case the AM335x is released from reset earlier than the phy, likewise had zero impact. Perhaps the phy is just really intolerant of a slow-rising reset, but that seems very odd given that the datasheet actually suggests using an RC-circuit on the reset input to generate the required reset timing.

In short summary, the phy just sucks. Has it ever been considered to just swap it out for one that doesn't suck?

Answer 26 · 2021-03-30T06:56:25.000Z

Just to add, the fixes that I believe would solve the problem are:

replace the phy by one that doesn't have this problem
use a GPIO to reset the phy instead of using the processor reset signal
extend the reset time (by reprogramming PGDLY or using an external reset extender) and remove the capacitance on the reset line to ensure a sharp rising edge

Answer 27 · 2021-03-30T12:05:29.000Z

is there a safe gpio line we can use for the MDIO reset? We will eventually need to address the microSD card cage issue and therefore do a hardware rev.

You could reuse the eMMC reset line, since this line has not worked for the intended purpose (keeping eMMC in reset to ensure it does not cause problems when reusing the eMMC pins) since the Micron eMMC (whose reset input is low-level-triggered) was swapped out for Kingston eMMC (whose reset input is rising-edge-triggered).

Answer 28 · 2021-08-30T15:02:20.000Z

Resolved in 5b06500. Reset GPIO is GPIO1_8.

Answer 29 · 2021-12-15T05:35:05.000Z

I only just noticed in the C3 schematic that the phy reset is being driven by an AND-gate with open-drain output and weak pull-up (10KΩ) and large capacitance (4.7μF) on its output, yielding a 47 ms RC-time, which seems like a bad idea considering slow rise time on the reset line appears to be the main cause for the phy problems in the first place. A push-pull output would have been more appropriate (and eliminates 2 components). It's not a huge deal since the reset gpio allows for multiple attempts at resetting the phy if necessary, but it is a bit of a weird choice.

Is C3 already in production btw? There's no EEPROM identifier listed for it yet, will it be A335BNLT00C3? I noticed the BBBs currently produced by Seeed erroneously identify themselves as A335BNLTEIA0 (Element14 BBB Industrial A0).

Answer 30 · 2021-12-21T14:26:24.000Z

It seems that completely removing C24 fixes the issue for us (powering the board through the cape connectors).
If I understand correctly the side effect should be that the reset button does not work reliably anymore, but we are not using that anyway.
Can anybody see a problem with this approach I am missing?

Answer 31 · 2021-12-24T11:24:58.000Z

@svdmark As shown in my comment earlier, in my experience removing C24 helped a lot but did not fix it completely. The sensitivity to this issue varies per board though, and the one I tested on was particularly sensitive. Also, the purpose of C24 is to ensure the minimum reset time for the phy is met, and removing it without somehow extending the power-on-reset duration will violate the reset timing requirements specified in the phy's datasheet. Whether or not that will cause problems in practice, I do not know.

And of course having to patch the board is not exactly an ideal workaround.

It seems the C3 revision with the new phy reset gpio is either currently shipping or about to be, based on this forum thread. If a small bit of logic is added to u-boot to reset the phy until it's working properly, this problem will finally be fixed.

@jadonk This issue should perhaps remain open until that software fix is actually implemented? The reset gpio by itself doesn't fix the problem if it's not being used.

Answer 32 · 2022-10-18T13:01:32.000Z

@RobertCNelson do you know if this has been implemented yet?

Answer 33 · 2022-10-18T15:37:02.000Z

@jadonk this is a todo... i think it's best to implement the "reset" in u-boot... mainline linux gpio-reset for phy's cpsw, is ongoing (last i checked)..

We should just do this in u-boot, blindy reset the gpio, connected or not on the "bone-black" target..

Answer 34 · 2022-10-18T15:39:05.000Z

Just like the icev2 is done... https://github.com/u-boot/u-boot/blob/master/board/ti/am335x/board.c#L789

Answer 35 · 2022-10-18T15:46:41.000Z

I agree. It should be innocuous to do it blind.

…

On Tue, Oct 18, 2022 at 11:37 AM Robert Nelson ***@***.***> wrote: @jadonk <https://github.com/jadonk> this is a todo... i think it's best to implement the "reset" in u-boot... mainline gpio-reset for phy's cpsw, is ongoing (last i checked).. We should just do this in u-boot, blindy reset the gpio, connected or not on the "bone-black" target.. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACGOUELYXJXLSTND7TCQDWD272TANCNFSM4EVMO5JA> . You are receiving this because you were mentioned.Message ID: ***@***.***>