/node-disk-manager

Manage physical disks in Harvester

Primary LanguageGoApache License 2.0Apache-2.0

node-disk-manager

disk manager help to manage host disks, implementing disk partition and file system formatting.

Building

make

Running

./bin/node-disk-manager

Features

  • Disk provisioning as Longhorn disks with a simple boolean.
  • Disk formatting if needed with a simple boolean.
  • Disk discovery, including existing block devices, and hot plugged disks.
  • Support multiple storage controller (IDE/SATA/SCSI/Virtio).
  • Support virtual disks (WWN on the disk is required for unique identification).
  • Device mapper and LVM are not yet supported.
  • The behaviour of multipath devices is undefined.

Architecture

The Node Disk Manager (a.k.a. NDM) is a simple Kubernetes controller, following the famous controller pattern. It leverages Rancher's wrangler framework to construct a controller.

NDM is a single binary built with Golang and designed as a Kubernetes DaemonSet. You can find more information about how NDM is shipped with Harvester from this helm chart definition.

NDM has two main functionalities: disk discovery and disk provisioning. Each is handled by dedicated components in this project. We'll discuss each topic separately later. First, let us learn about the custom resource for NDM: blockdevices.

blockdevices Custom Resource

A blockdevice is a Kubernetes custom resource (CR) that represents a block device on a node. blockdevice CR records several lower-level block device information from the operating system, for example, file system status, mount point, and UUIDs. These details are all stored in status.deviceStatus.

The name of a blockdevice is a global identifier across nodes within the whole cluster. At this moment, we recommend disk you want to provision to have at least WWN on it. It helps the system to globally identify the blockdevice resource and link to real block device of the operating system.

Besides its name field, the most important fields you need to know is spec.fileSystem.provisioned and spec.fileSystem.forceFormatted. The format implies that a user expects the block device to be provisioned as Longhorn disk for further usage. And the latter just indicates that NDM would perform a disk formatting if not yet done before.

Disk Discovery

As a daemonset workload, each NDM instance takes charge of disk on its own node. There are two components collecting the information of disks on the node, as well as creating, updating, or deleting corresponding blockdevice CR.

The first is scanner. It scans all supported block devices on the system and creates a new one if not exists, or deletes old one if is already removed from the system. For block devices that need to update, it simply enqueue the blockdevice CR to let blockdevice controller handle the update path to prevent any possible race condition. Scanner also periodically scans the system to inform the controller to update info if needed.

The other key component is udev, which utilizes Linux's dynamic device management mechanism. udev, as a supplement of scanner, mostly behaves the same as scanner, but instantly for responding to hot-plugged devices.

There is a module filter. It comprises several filter functions, which get their own predicates to determine which block device should be collected by scanner and udev.

Disk Provisioning

The controller of NDM listens for changes of blockdevice CR and perform corresponding actions, namely

  • Format disk
  • Mount/Unmount filesystem
  • Provision/Unprovision disk to/from Longhorn
  • Update device status details

Which actual action to perform is determines by the combination of spec.fileSystem, device formatting and mounting status, and status.provisionPhase. The last one indicates whether the block device is currently used by Longhorn.

To avoid any race condition, the controller must be the only component that updates existing blockdevice CR. Other components who need an update must enqueue the CR instead.

Appendix

We recommend user use the SCSI device, which contains the WWN to test the NDM.

Here we give the Sample XML for libvirt to create a SCSI device with WWN.

    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/tmp/libvirt_disks/harvester_harvester-node-0-sda.qcow2'/>
      <target dev='sda' bus='scsi'/>
      <wwn>0x5000c50015ac3bd9</wwn>
    </disk>

NOTE: If we create w/o WWN, NDM will use filesystem UUID as a unique identifier. That has some limitations. For example, the UUID will be missed if the filesystem metadata is broken.

License

Copyright (c) 2024 Rancher Labs, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.