Sled Agent: Ensure that all U.2 users "let go" on disk expungement
Closed this issue · 0 comments
This is a sub-issue of #4719
This issue tracks all sled agent usage of U.2s, and tracks that once a physical disk is marked expunged in the API, it is no longer in-use by the Sled Agent.
Why?
Expungement is the term in the control plane for permanent removal of a sled or physical disk. Specifically, it refers to the removal of the "control plane object" representing this hardware - the same hardware may potentially be re-used if a new "control plane object" is allocated to it.
If we don't track down the following use-cases, then expunging a physical disk may not stop us from actually using it, which could result in unexpected behavior (e.g., still accessing storage from a disk that we explicitly attempted to stop using).
What
This issue attempts to track all spots in the sled agent where all_u2_zpools or all_u2_mountpoints are invoked, and traces how long those output paths are claimed.
- Durable zone storage (datasets) through the
omicron_zones_put
API. These are currently being created before the zone is initialized (context), and assumed to exist when the zone is initialized. It is the responsibility of Nexus to ensure that these zones have been removed.- This is happening in the reconfigurator here:
omicron/nexus/reconfigurator/planning/src/planner.rs
Lines 674 to 680 in 01d8b37
- This is happening in the reconfigurator here:
- Transient zone filesystems through the
omicron_zones_put
API. At the time of writing this issue, these zone are selected locally within the sled agent in a function called validate_storage_and_pick_mountpoint. Making Nexus aware of these transient zone filesystem placements, and removing the associated zones when a disk is expunged, is tracked by #5048. - Adding probes to a sled also launches a zone, which necessitates picking a transient zone filesystem. Like other zones, this code selects a random U.2 for filesystem storage, which may later be expunged. In order to avoid disrupting system operation, Nexus must be aware of this placement decision to manage fault domains accurately.
- Zone bundles manage debug datasets across all U.2s, and on disk expungement, should stop accessing these directories.
- #5965 ensures that U.2s are no longer queried after expungement
- Instances launch zones, and use a sled-selected U.2 for filesystem storage. If the disk associated with an instance is expunged, the propolis zone should be destroyed, and possibly re-allocated to a different zpool.
- #5965 removes instances using storage on expunged U.2s
- Dump Device Management: The sled agent uses debug datasets from U.2s, if they are present (see: https://github.com/oxidecomputer/omicron/blob/main/sled-agent/src/dump_setup.rs), and these should no longer be considered if the disk has been explicitly expunged.
- #5965 fixes this issue, ensuring that disk expungement propagates to the dump device manager before returning.