/illumos-zonemgr

Automate illumos local zone management with inclination towards Golden Image and Disposable zones

Primary LanguageShell

About illumos-zonemgr

This repo holds the script to manage illumos zones lifecycle for essentially disposable zones that are clones of a "golden" template environment, which has its own update schedule, if intended.

The initial intended use-case is management of dedicated single-use build environments with just enough tools and prerequisites installed for compilation of the illumos-gate and userland stacks (including disposable zones with just the minimal set of known tools and prerequisites for a single component build).

This file and its contents are supplied under the terms of the
Common Development and Distribution License ("CDDL"). You may
only use this file in accordance with the terms of the CDDL.

A full copy of the text of the CDDL should have accompanied this
source. A copy of the CDDL is also available via the Internet at
http://www.illumos.org/license/CDDL.

Copyright 2016-2017, Jim Klimov. All rights reserved.

Terminology

Common definitions (from the Solarish ecosystem):

  • Boot Environment (BE) — a filesystem with contents sufficient to instantiate a fully-fledged usable system for a given architecture and environment (e.g. kernel and drivers may be missing in a BE for lightweight virtualization container).

  • Global Zone (GZ) — the "physically installed" illumos-based distro in a barebone or VM computer, running a kernel and interfacing with hardware. It arranges the resource delegation and separation needed to run local zones.

  • Local Zone or Non-Global Zone (LZ or NGZ) — a container running in the same kernel as the GZ, providing a lightweight virtualized environment (with individual root filesystem, networking stack, process and user-account spaces) orchestrated by the GZ. If there is any resource sharing with other environments (e.g. access to common filesystems) — then it is intentionally done by the setup in the GZ, and/or by general approach to shared resources (usually via net — e.g. NFS, LDAP, etc.)

  • Zone Brands — some virtualized environments want to seem different from the hosting GZ. This may be a nuance like packaging with arbitrary versions of the software, rather than keeping in sync with versions in the GZ, but still running in the same kernel; or there can be considerable differences like syscalls for emulation of a different kernel (e.g. older Solaris release zones, or Linux zones). Implementation of brands may be a mix of scripts for system management methods and some binary code for kernel/syscall emulation.

  • Operating Environment (OE) — a BE, such as the GZ or LZ root filesystems or an installation/recovery miniroot, and the actual instance of an operating system running over such a filesystem.

Local definitions (introduced by this project):

  • Golden Image Zone (GIZ) — a local zone prepared in advance that contains minimal systems setup and a defined set of packages, and that is not intended for direct booting — but rather for snapshotting and cloning into applied deployments. The BE filesystem  for such zones is reasonably called a Golden Image (GI). The workflow below considers golden-image zones created from scratch and from snapshots of earlier (more generic) GI zones.

  • Disposable-Image Zone (DIZ) — a clone of some particular golden-image zone which is intended to persist while running some time- and scope-limited task (e.g. a build) possibly with more specific resources designated to help fulfill this job, and which is to be destroyed afterwards.

Notes for development and contributors/hackers

The following blocks of text are intended mainly for developers of the script, including prospect contributors and power users who want to figure out how it works (or is supposed to, at least) and why. And to keep track of how much of the plan is completed. And to have a plan at all :)

Expected workflow

All of the below assumes working from an account with RBAC privileges sufficient to create, manage and destroy local zones and datasets (e.g. a Primary Admin role) via pfexec, and doing so from a Global zone.

  • Slurp script configs (hardcoded defaults; system, user level and command-line passed config file overrides) — e.g. container root datasets

  • Zone-creation task stack:

    • Create the golden-image containment dataset if missing yet

      • Disable auto-snaps

      • Enable strong compression

      • Enable dedup?

    • Create the golden-image zone from scratch, if the name is not yet occupied:

      • Let specify image template type (linked, brand, etc.) to set up the config

      • Minimal config zone-wise (no resources, filesystem/dataset provisions, networking, user accounts…​); set autoboot:=false

      • Let specify package source(s) — URL to IPS repo, possibly just one publisher may be allowed for initial installation, but more may be added after initial "image" setup

      • Consider packaging, tarballs, zfs-send images…​

    • Create a golden-image zone from existing golden-image zone (clone current state, last or specified snapshot):

      • Add metadata (zonecfg field?) to track which GI this image was built from

      • Specify package(s) to remove from the zone compared to original while creating it (e.g. to avoid conflicts for subsequently added packages)

      • Specify package(s) to add into the zone compared to original while creating

      • Same limitations (e.g. simplicity) as above

    • Create the disposable-zone containment dataset if missing yet:

      • Disable auto-snaps

      • Enable moderate or no compression

      • Disable dedup

    • Create a disposable zone from specified golden-image zone (clone current state, last or specified snapshot):

      • Add resource definitions (datasets/lofs, network, user accounts/ldap, …​)

      • Script cloning of file-based user/shadow and group account data from GZ to LZ for respective specified accounts

      • Manage VNICs?

  • General maintenance actions:

    • iterate all golden-image zones to upgrade them (e.g. from crontab)

  • General actions against any managed zones (GI or DZ):

    • generate a unique name-part for the zone (hash of concat of sorted requested pkg names?) that may get prefixed and/or suffixed for GI and DZ instances

    • snapshot a zone

    • clone a zone (clone current state, last or specified snapshot)

    • pkg-update contents of a (golden-image?) zone, snapshot after success

    • install specified package(s) into the named zone, snapshot after success

    • start, stop a zone

    • destroy a zone (including config)

    • halt and roll back to specified snapshot (e.g. reuse same)

    • run a command inside the zone (via zlogin), maybe as a specified account

Notes for development and architecture

  • Certain limitations, such as the initial set of installed packages and the amount of package repositories or other media (archives, snapshots) that can be used to seed a local zone’s filesystem, come from the existing zone brand scripts which differ between distributions. While I have done some private hacks to circumvent these limitations in my experiments, a proper solution setup might be to upstream these experiments or define and package custom zone brands - but either way the proper fix is outside a script like this one.

  • When developing and testing, keep in mind that non-root user with specified RBAC (pfexec, not sudo) privileges should be able to perform operations. This especially concerns operations in the global zone where files have to be read or written by a root-like entity due to access permissions. It is likely that during initial PoC of the logic some corners are cut — so do at least leave a TODO/TBD/FIXME note that this would have to be addressed later. Some hints on implementation:

    • A MY_UID value is tracked to check quickly if running as root (0).

    • For operations in local zones, ready_zone() and zlogin -S can be used.

    • For operations in global zone, the set of requested permissions includes the backup operations, which allows to cpio files with elevated rights.

Notes for testing

  • A basic test-driver script is added at test/zonemgr_selftest.sh to run further snippets maintained in test/tests/*.test files (per its default search) or in any full filename it can source.

  • Test-suite is wanted :) It would help to both keep track of usage scenarios (implemented ⇒ non-regression, and planned ⇒ TDD until it works), and to serve as practical documentation of script’s abilities and the command-line spells.

  • Inside the test suite, it can be helpful to provide a configuration file so that only specific ZFS tree can be manipulated with different settings, etc. and refer to that file by export ZONEMGR_CONF=/path/to/test.conf.

  • For development, if the script is called with DEVEL_DEBUG among its CLI options, then the rest of options can hook into a single procedure and its arguments. You can take advantage of CLI parser before this token, or use it right away as the first one on the command line (beware that many of the default variable values would not yet be applied).

  • For testing, the script supports being included. Maybe the DEVEL_DEBUG mode would move into an extra helper for unit-testing, later in evolution.

Usage examples

The following paragraphs intend to show specific examples of zonemgr usage.

Beside command-line options listed in the help text, internal configuration variables can also be set using configuration files, overlaying system and user defaults provided in /etc/zones/zonemgr.conf, ${HOME}/.zonemgr.conf and ${ZONEMGR_CONF} (any of these is only sourced if present), and a file specified on command-line with the -c /path/to/myconfig.conf option.

  • Create a default GIZ with no details (system-default brand, no VNICs, no special datasets, no packaging customizations, etc):

:; ./zonemgr create-giz
  • Create a default GIZ with specified name:

:; ./zonemgr create-giz -z test-giz-1 -b SUNWipkg
  • Create a default DIZ with automatic name, cloning a previously created GIZ (or newly spawned, in its absence):

:; ./zonemgr create-diz -c test-giz-1
  • NOTE: At this time, cloning is incomplete.

    • Create a DIZ with some networks (DHCP is automatically enabled) and filesystems:

:; ./zonemgr create-diz -z test-diz-1 --mount-lofs "/export/home" \
   --delegate-dataset "pool/zones/_delegated/test-diz-1" \
   --add-vnic-over "etherstub127 e1000g0|81|00:12:34:56:78:ab"
  • NOTE: At this time, cloning and guessing of suitable GIZ are incomplete.

    • Create a DIZ with NFS-mounted paths:

:; ./zonemgr create-diz -z test-diz-1 \
   --mount-autonfs "/mnt/.ccache|filer:/var/.ccache-shared"
  • NOTE: At this time, cloning and guessing of suitable GIZ are incomplete.

  • NFS+AUTOFS may also be unimplemented yet.

    • Common house-keeping:

:; ./zonemgr list
:; ./zonemgr list-diz
:; ./zonemgr list-giz

:; ./zonemgr is-managed -z test-zone-1

:; ./zonemgr -q get-state -z test-zone-1
  • Runtime lifecycle of an installed zone:

:; ./zonemgr -z test-diz-1 boot
:; ./zonemgr -z test-diz-1 halt
:; ./zonemgr -z test-diz-1 shutdown
:; ./zonemgr -z test-diz-1 reboot
:; ./zonemgr -z test-diz-1 mount
:; ./zonemgr -z test-diz-1 umount
:; ./zonemgr -z test-diz-1 ready
  • Uninstall a zone:

:; ./zonemgr -z test-diz-1 destroy
  • Check zone-manifest consistency:

:; ./zonemgr -z test-diz-1 verify
  • Snapshots:

:; ./zonemgr snapshot -z test-giz-1 -s snap1
:; ./zonemgr rollback -z test-giz-1 -s snap1
:; ./zonemgr list-snapshots -z test-giz-1

:; ./zonemgr create-diz -c test-giz-1 -s snap1 -z test-diz-1
  • Execute command in a discardable zone:

:; ./zonemgr exec --exec-discard onsuccess -z build-diz-1 \
   --exec-cmd "cd /home/illumos-gate && ./nightly.sh"

:; ./zonemgr exec-diz -c build-giz-jdk \
   --exec-user jenkins --copy-users "jenkins jim" --elevate-users "jim" \
   --copy-groups jenkins \
   --mount-lofs "/export/oi-userland" \
   --add-vnic-over "etherstub127" \
   --exec-cmd "cd /export/oi-userland/components/java8 && gmake publish"
  • Debug a routine using some CLI settings:

:; ./zonemgr rollback -z test-giz-1 -s initial-install && \
   ./zonemgr -z test-giz-1 --copy-users "jim" --elevate-user jim \
       --mount-lofs /export/home DEVEL_DEBUG setup_zone_postinstall