root zpool should use failmode=panic

Question

root zpool should use failmode=panic

Closed this issue a year ago · 2 comments

See oxidecomputer/omicron#2766 for background. In the case of the root pool, there is only ever one vdev, it can never be removed or turned off because it's backed entirely by pinned RAM, and if it fails we have no hope of any other recovery path. Therefore it should be created with failmode=panic. Unlike persistent pools, this will pretty much always lead to recovery; about the only way we can get here is kernel memory corruption. Rebooting will reload the pristine ramdisk contents and almost certainly allow normal operation to resume. The one nasty exception will be if a ramdisk page gets retired due to a hardware memory error: because we'll probably get the same block of memory the next time around and we haven't loaded any retire store, we'll end up in a loop. That's a separate bug: we'd like to be able to load a retire store before we have phase-2. This is, to put it mildly, a big old can of worms -- and one that's worth peeking inside only after we can, you know, diagnose DRAM faults in the first place.

Answer 1 · 2023-04-05T02:59:50.000Z

We could definitely get a retire store (a packed nvlist as I recall) as a small blob over IPCC when the time comes.

Is the assumption with the memory error that we'd be able to write to the memory at boot, but then it would fault later when we go to read or re-write it? When we load the pool from disk into DRAM I don't think we're necessarily getting a contiguous physical backing chunk.

Answer 2 · 2023-04-05T03:04:52.000Z

That's what I'd guess will often happen. While we may not be allocating a single chunk, what happens prior to phase2 is mostly deterministic and I would not be surprised if we often end up with more or less the same set of pages. Not guaranteed, of course, but if we don't want it to happen it's especially likely.