oxidecomputer/helios

Ensure Gimlet C/D SPI flash is run no faster than 33 1/3 MHz

Closed this issue · 15 comments

We've noticed instability on some Gimlet C instances (particularly '43) when their flash is run faster than 33 MHz. We think we've got a capacitance issue on some of the nets.

All the machines we've tested seem stable at 33, so, we should configure the PSP and later code to use that speed or lower. It's possible you're already at this speed or lower, in which case, no action required!

As part of this we should also evaluate the rest of the EFS settings such as whether or not we should be using QSPI or not.

@refugeesus notes that the host may actually be stable at higher speeds, because it doesn't have to go through the translator. If you'd like to try running higher than 33, test carefully.

daym commented

I've checked milan-1.0.0.a/Specification/55758_2.02.pdf and they don't document the default speed they operate the SPI at (which is what we most probably end up using right now).

So it would be good for us to specify the speed explicitly with something like this (also for newer than Rome):

[...]
	processor_generation: "Milan",
	spi_mode_zen_rome: {
		fast_speed_new: "_33_33MHz",
		read_mode: "Normal33_33Mhz",
		micron_mode: "SupportMicron"
	},
[...]
daym commented

Without doing anything to the parameters, on milan ethanol x 1.0.0.a we have:

fec10000,0x24::map fec10000
fec10000,0x24::dump
                   \/ 1 2 3 4 5 6 7 8 9 a b c d e f v123456789abcdef
        fec10000: b700c00f 00000000 00000000 20002202 ..............".
        fec10010: 06200404 06049f05 030b0a02 ff980600 ................
        fec10020: 13073331 08202020 0c14060e c0d4c014 ..31............

And on milan gimlet b 1.0.0.a (b9) we have:

> fec10000,0x24::map fec10000
> fec10000,0x24::dump
                   \/ 1 2 3  4 5 6 7  8 9 a b  c d e f  v123456789abcdef
        fec10000:  b700c00f 00000000 00000000 20002202  ..............".
        fec10010:  06200404 06049f05 030b0a02 ff980600  ................
        fec10020:  13073331 08202020 0c14060e c0d4c014  ..31............

Contains:

  • SPIx00000000 (FCH::ITF::SPI::SPI_Cntrl0) with SPI=FEC1_0000h; value is 0x0fc0_00b7 = 0b1111_1100_0000_0000_0000_1011_0111, so SpiReadMode = 0b000 = Normal read (up to 33 MHz)
  • SPIx00000020 (FCH::ITF::SPI::Spi100Enable) with SPI=FEC1_0000h; value is 0x3133_0713 = 0b11_0001_0011_0011_0000_0111_0001_0011 (so NormSpeed = 3 = 16.66 MHz, FastSpeedNew = 1 = 33.33 MHz, AltSpeedNew = 3 = 16.66 MHz, TpmSpeed = 3 = 16.66MHz, UseSpi100 = 1 = new SPI100 engine)

This was measured in nanobl-rs, so before illumos.

daym commented

FWIW it also boots with

diff --git a/etc/milan-gimlet-b.efs.json5 b/etc/milan-gimlet-b.efs.json5
index 9115856..aa393ec 100644
--- a/etc/milan-gimlet-b.efs.json5
+++ b/etc/milan-gimlet-b.efs.json5
@@ -1,5 +1,10 @@
 {
        processor_generation: "Milan",
+       spi_mode_zen_rome: {
+               fast_speed_new: "_33_33MHz",
+               read_mode: "Quad144",
+               micron_mode: "SupportMicron"
+       },
        psp: {
                PspDirectory: {
                        entries: [

SPI fast mode is probably only configured for a short time while doing burst reads, so even with that, the values in SPIx00000000 and SPIx00000020 are the same most of the time.

Check docs for defaults and then check what we have configured this for. Make sure not over 33 1/3.

daym commented

AGESA docs:

PcdResetSpiSpeed (Default: 16.66 MHz)
PcdResetFastSpeed (Default: 33 MHz)
PcdResetMode (Default: 33 MHz, no QSPI)

But since those are edk2 settings, not sure that those apply to us (we don't set up the efs values for those at all, so they default to 0xff)

daym commented

Right now, we are using:

Efh {
    ...
    spi_mode_bulldozer: EfhBulldozerSpiMode { read_mode: 255, fast_speed_new: 255, _reserved: 255 }, 
    spi_mode_zen_naples: EfhNaplesSpiMode { read_mode: 255, fast_speed_new: 255, micron_mode: 255 },
    spi_mode_zen_rome: EfhRomeSpiMode { read_mode: 255, fast_speed_new: 255, micron_mode: 85 } }
}

Relevant one is spi_mode_zen_rome.

The values read_mode: 255, fast_speed_new: 255 are not documented in amd # 55758 2.02 Aug 2022, so that's... bad. I do think that the intent of AMD was to allow other values here, if the older config blobs for bulldozer and naples are any indication (also that the intent is to have all the unused ones be 0xff). I do think they handle the case 0xff.

To have some data on what happens then, I booted and read the value of register FCH::ITF::SPI::SPI_Cntrl0 after booting (default seems to be ReadMode = 0, so up to 33 MHz).

Also FCH::ITF::SPI::Spi100Enable, to check whether that enabled and also to check the config settings (default seems to be NormSpeed = 16.66 MHz, FastSpeed = 33.33 MHz, AltSpeed = 16.66 MHz, TpmSpeed = 16.66 MHz, SPI CS deassertion time: 7 cycles)

Specifically SPI_Cntrl0 at 0xFEC1_0000 (size 32 bit) and Spi100Enable at 0xFEC1_0020 (size 32 bit).

I got

[0]> FEC1_0000\X
0xfec10000:     fc000b7         
[0]> fec1_0020\X
0xfec10020:     31330713        

First value is 0xb7 higher than the default. The bottom 8 bits are just the spi opcode, so that just means someone used the spi (as opposed to reconfigured it).
Second value is 0x1 higher than the default. The bottom bit is whether the new SPI100 engine is enabled. So it is in fact enabled.

daym commented

In the end we default spi_mode_zen_rome if it's not given. More specifically, exactly this line: https://github.com/oxidecomputer/amd-host-image-builder/blob/baedf259451c4562b89a7c655b776683f04be0c5/amd-host-image-builder-config/src/lib.rs#L274

The default is 0xff, 0xff, 0x55.

daym commented

Edited rom image directly at location 0xFA_0047 for SpiReadMode (raw value < 8), location 0xFA_0048 for SpiFastSpeedNew (raw value < 6), 0xFA_0049 for micron detect flag (raw value can be 0x55) and got the following results for SPI_Cntrl0 and Spi100Enable with boot-time kmdb on gimlet b9.

SpiReadMode SpiFastSpeedNew SPI_Cntrl0 Spi100Enable
0, 1, 2, 3, 4, 5, 6, 7, 0xff 0 (66.66 MHz) 0xfc000b7 0x31330713
0, 1, 2, 3, 4, 5, 6, 7, 0xff 1 (33.33 MHz) 0xfc000b7 0x31330713
0, 1, 2, 3, 4, 5, 6, 7, 0xff 2 (22.22 MHz) 0x2fc000b7 0x31330713
0, 1, 2, 3, 4, 5, 6, 7, 0xff 3 (16.66 MHz) 0x2fc400b7 0x31330713
0, 1, 2, 3, 4, 5, 6, 7, 0xff 4 (100 MHz) does not boot does not boot
0, 1, 2, 3, 4, 5, 6, 7, 0xff 5 (800 kHz) doesn't boot (in 24 h) doesn't boot (in 24 h)
0xff 0xff 0xfc000b7 0x31330713

For whatever it's worth, I just measured the SPI clock on my Gimlet C running latest (Saleae, 100 MHz on the SP3 side of R5792):

2023-06-20-12-45-17_2148x249

So it appears to be running at 16.67 MHz. From eyeballing it, the entire time period at which SPI is being clocked lasts about 6.1 seconds -- and the bulk of that is happening over a 2.2 second period:

2023-06-20-14-47-28_2609x424

There is opportunity here, but not a huge amount -- and certainly at non-zero risk! This should be left at 16.67 MHz, but should certainly be setting it with defined values.

daym commented

Bryan, thank you for doing that measurement! That makes me a lot less worried now. Before, I'd have kicked myself if the reality was different to what the hardware is "supposed to be" set up as according to docs.

There is a little wrinkle that AMD's new engine can change spi speed for a little while while in operation depending on what they are doing it for (Tpm, Alt mode, Fast mode and so on).
See SPIx00000044 (FCH::ITF::SPI::ModeByte) ALTOPCODE_EXCUTE.

Anyway, what we want is to set up the config with defined values such that SPI_Cntrl0 = 0xfc000b7 (as it is now without any change). Apparently, that means efs.SpiFastSpeedNew = 1 (33.33 MHz) and efs.SpiReadMode don't care.

I think what is going on is they use 16.67 MHz as a default that cannot be changed via efs, and for large bulk transfers they allow us to set a higher rate that will be used for a while.

Possible values for FastSpeedNew:

#[non_exhaustive]
pub enum SpiFastSpeedNew {
    #[cfg_attr(feature = "serde", serde(alias = "66.66 MHz"))]
    _66_66MHz = 0,
    #[cfg_attr(feature = "serde", serde(alias = "33.33 MHz"))]
    _33_33MHz = 1,
    #[cfg_attr(feature = "serde", serde(alias = "22.22 MHz"))]
    _22_22MHz = 2,
    #[cfg_attr(feature = "serde", serde(alias = "16.66 MHz"))]
    _16_66MHz = 3,
    #[cfg_attr(feature = "serde", serde(alias = "100 MHz"))]
    _100MHz = 0b100,
    #[cfg_attr(feature = "serde", serde(alias = "800 kHz"))]
    _800kHz = 0b101,
    DoNothing = 0xff, // well maybe
}

If it's not too much work, could you make a measurement right when it's loading massive blobs from the flash?

daym commented

The efs.SpiReadMode is mostly how many parallel data lines there are via SPI. We could eventually increase that (to quad) if we wanted and our hardware has those lines; frequency staying the same for that(common for all lines). That is configurable via this:

#[non_exhaustive]
pub enum SpiReadMode {
    #[cfg_attr(feature = "serde", serde(alias = "Normal 33.33 MHz"))]
    Normal33_33Mhz = 0b000, // up to 33.33 MHz
    /// First digit of name: number of lines (bits) for the command
    /// Second digit of name: number of lines (bits) for the address
    /// Third digit of name: number of lines (bits) for the data
    Dual112 = 0b010,
    /// First digit of name: number of lines (bits) for the command
    /// Second digit of name: number of lines (bits) for the address
    /// Third digit of name: number of lines (bits) for the data
    Quad114 = 0b011,
    /// First digit of name: number of lines (bits) for the command
    /// Second digit of name: number of lines (bits) for the address
    /// Third digit of name: number of lines (bits) for the data
    Dual122 = 0b100,
    /// First digit of name: number of lines (bits) for the command
    /// Second digit of name: number of lines (bits) for the address
    /// Third digit of name: number of lines (bits) for the data
    Quad144 = 0b101,
    #[cfg_attr(feature = "serde", serde(alias = "Normal 66.66 MHz"))]
    Normal66_66Mhz = 0b110, // up to 66.66 MHz
    Fast = 0b111,
    DoNothing = 0xff, // well maybe
}

Running with latest (that is, 1.0.0.a), there are no accesses faster than 16.67 MHz. I'm attaching the Saleae capture filre (100 MHz) should one wish to look at it in more detail, but at no time are accesses faster than 16.67 MHz.

daym commented

Successfully tested (including timing it--before: 60 s; afterwards: 60 s) with the following config:

spi_mode_zen_rome: {
    fast_speed_new: "16.66 MHz",
    read_mode: "Normal up to 33.33 MHz",
    micron_mode: "SupportMicron"
},

I think read_mode is just how many lanes in parallel the SPI controller AMD has uses, and they abused it to also allow artificially limiting the speed once more (why?!). Anyway, it's the slowest read_mode setting they have.

See also amd# 55758