facebook/openbmc

Flashy will break on AST2600

lhl2617 opened this issue · 6 comments

127f18d points out that different SOC has different address space, and thus AST_SRAM_VBS_BASE needs to be defined differently.

In flashy, vboot-util is "built-in"--the logic is replicated within the code, and can be seen here: https://github.com/facebook/openbmc/blob/helium/tools/flashy/lib/utils/vboot.go.
This replication, as recommended by @lsiudut et al., is due to early vboot-util bugs causing flash failures--I do not recall specifics, but it most likely is related to this commit: 164ec0b.
The major benefit of having vboot-util logic within flashy ties to flashy's motivation itself: the ability of applying hotfixes within flashy's code to flash devices with outdated utilities (incl. vboot-util) causing failing flashes.

The problem arises here: AST_SRAM_VBS_BASE is hardcoded in flashy, and it'd be hard to replicate the logic:

const AST_SRAM_VBS_BASE = 0x1E720200

As of now, I think vboot-util should be stable enough, so the trivial fix is to rely on vboot-util within the system itself and rip the code off flashy, but the risk of such utilities causing large-scale flash failures still exist.

A better fix would be to somehow dynamically assign AST_SRAM_VBS_BASE in flashy. Opening this issue to discuss further on how we can do this.

cc: @zhdaniel12 @kawmarco @amithash

Addendum

There are other issues I wish to discuss as well, most notably keeping flash utilities in harmony. Currently fw-util is understandably the gold standard for up-and-coming platforms, and thus we'd need to figure out a way to keep flashy etc. in sync. There are however differences in how flashy and fw-util works (esp vboot flashing--see

var flashCpAndValidate = func(
: flashy skips the 84K RO region, whereas fw-util makes a new image and patches the image with the flash device's RO region).

Another issue is one that I've recently tackled in f4e9048 in response to f137c9a . An extra note that the flashy fix is in fact redundant, as flashy's validation is not tied to 32MB, and flashcp (both flashy's and the normal implementation) does a health check to make sure the size of the image is smaller than that of the device. The fw-util change is likely in response to the check_image implementation which assumes 32MB max image size (there is a calloc for that exact size here to validate images). This is a blocker for >32MB image sizes, so I'm flagging this up as well to @amithash. I reckon a similar fix as to how @zhdaniel12 implemented the vboot-util fix should suffice on newer platforms with bigger image sizes.

There was also a fbnd -> northdome issue: 6e7e782 & e2cf878
I think I wrote internal documentation about platform issues in the internal Wiki before I left, but I do not recall now.

In short, maintaining these tools with vague dependencies is hard, and responding to them passively would prove problematic in the long run. It'd be nice if we can discuss on how to deal with these problems.

Understandably, a lot of docs + tooling are internal--I will be starting in LDN at 19th July, and while unfortunately it seems that I will be unable to join the DUB/MPK based OpenBMC PE team, I'd be very happy to connect and jump in a meeting to discuss these issues.

I researched further: perhaps flashy should also have its own soc-util?

https://github.com/facebook/openbmc/blob/helium/meta-aspeed/recipes-utils/soc-utils/files/soc-utils.sh

thanks @lhl2617, recently we landed 1d35aa0, thus flashy shall be able to fix flashy.

@zhdaniel12 Working on a fix :) Thanks!

@lhl2617 I don't think w should rely on the new structures to detect the system as it won't work on the systems that are not having it implemented. This applies to golden images across the fleet.

@lsiudut good point, I'm looking at it now. My plan was to fall back to the default one for AST2400/AST2500.

I can however read off the device tree as in 1d35aa0