oxidecomputer/omicron

Sled Agent: Ensure that all U.2 users "let go" on disk expungement

Closed this issue · 0 comments

This is a sub-issue of #4719

This issue tracks all sled agent usage of U.2s, and tracks that once a physical disk is marked expunged in the API, it is no longer in-use by the Sled Agent.

Why?

Expungement is the term in the control plane for permanent removal of a sled or physical disk. Specifically, it refers to the removal of the "control plane object" representing this hardware - the same hardware may potentially be re-used if a new "control plane object" is allocated to it.

If we don't track down the following use-cases, then expunging a physical disk may not stop us from actually using it, which could result in unexpected behavior (e.g., still accessing storage from a disk that we explicitly attempted to stop using).

What

This issue attempts to track all spots in the sled agent where all_u2_zpools or all_u2_mountpoints are invoked, and traces how long those output paths are claimed.

  • Durable zone storage (datasets) through the omicron_zones_put API. These are currently being created before the zone is initialized (context), and assumed to exist when the zone is initialized. It is the responsibility of Nexus to ensure that these zones have been removed.
    • This is happening in the reconfigurator here:
      // Should we expunge the zone because durable storage is gone?
      if let Some(durable_storage_zpool) = zone_config.zone_type.zpool() {
      let zpool_id = durable_storage_zpool.id();
      if !sled_details.resources.zpool_is_provisionable(&zpool_id) {
      return Some(ZoneExpungeReason::DiskExpunged);
      }
      };
  • Transient zone filesystems through the omicron_zones_put API. At the time of writing this issue, these zone are selected locally within the sled agent in a function called validate_storage_and_pick_mountpoint. Making Nexus aware of these transient zone filesystem placements, and removing the associated zones when a disk is expunged, is tracked by #5048.
    • #5931 makes Nexus aware of the filesystem placements
    • #5952 expunges zones using transient zone filesystems on expunged disks
  • Adding probes to a sled also launches a zone, which necessitates picking a transient zone filesystem. Like other zones, this code selects a random U.2 for filesystem storage, which may later be expunged. In order to avoid disrupting system operation, Nexus must be aware of this placement decision to manage fault domains accurately.
  • Zone bundles manage debug datasets across all U.2s, and on disk expungement, should stop accessing these directories.
    • #5965 ensures that U.2s are no longer queried after expungement
  • Instances launch zones, and use a sled-selected U.2 for filesystem storage. If the disk associated with an instance is expunged, the propolis zone should be destroyed, and possibly re-allocated to a different zpool.
    • #5965 removes instances using storage on expunged U.2s
  • Dump Device Management: The sled agent uses debug datasets from U.2s, if they are present (see: https://github.com/oxidecomputer/omicron/blob/main/sled-agent/src/dump_setup.rs), and these should no longer be considered if the disk has been explicitly expunged.
    • #5965 fixes this issue, ensuring that disk expungement propagates to the dump device manager before returning.