/home

Primary LanguageYAMLMIT LicenseMIT

My Kubernetes Homelab

This is the source repository for my homelab Kubernetes infrastructure built from repurposed HP ProDesk 600 G3 Desktop Mini PCs. The cluster runs Talos Linux, a minimal and immutable operating system designed exclusively for Kubernetes. The purpose of this project is to both gain useful experience with Kubernetes, but also to build a platform for providing legitimate production-quality applications and services in my home.

The project is organized into different "stacks" of components based on their function in the cluster.

System stack

System stack components are fundamental to the cluster delivering core functionality including networking and persistent storage to applications.

Application stack

Application stack components provide usable functionality to end users and rely on components in the platform and system stacks.

Root app

Following Argo CD's app of apps pattern, apps for which Argo CD should sync are managed by an app themselves. This is controlled by the root app which generates the desired manifests with Jsonnet to avoid duplication.

Secrets management

Secrets in this repository are encrypted with SOPS and applied via Kustomize with the KSOPS plugin. Secrets are readable with my personal AGE private key as well as a private key created for the production Argo CD deployment.

Development environment

Tools for working with Talos Linux, Kubernetes, etc, are managed by this project's flake.nix. With direnv, all tools are installed and ready to use when navigating to the folder in a terminal.

A rudimentary shell script, hack/dev-cluster.sh, provisions a local development cluster using a containerized version of Talos.

💡 Note: To successfully deploy LINSTOR in the development cluster, the host Linux system must have the DRBD 9 kernel module installed.

Talos configuration

The production cluster's configuration is generated with talhelper genconfig based on the talconfig.yaml and talsecret.sops.yaml files. talosctl is used to apply configuration, upgrade Talos Linux, and upgrade the Kubernetes version on the cluster based on the configuration generated by talhelper.

💡 Note: When booted from the installation media, the nodes will run in "maintenance" mode. Applying a configuration to them with talosctl will install Talos to the disk and attempt to join the cluster.

The first node of the etcd cluster must be bootstrapped manually with talosctl. Other nodes will then automatically join the cluster based on their applied configuration.

Disk encryption and UEFI Secure Boot

Working Secure Boot is required to enable secure TPM-backed disk encryption.

Node preparation

The nodes must be prepared to accept the Secure Boot keys provided by the Talos installer. The UEFI firmware must be configured to clear all existing Secure Boot keys to allow the Talos installer to apply Sidero's platform key to the system.

💡 Note: It is important to retain the Microsoft UEFI CA certificate in the signature database to continue to allow option ROMs (such as for display adapters) to load. On HP systems specifically, failing to do so will prevent normal access to the UEFI firmware interface.

Talos also requires TPM 2.0 to support TPM-backed disk encryption. While the HP ProDesk 600 G3 ships with TPM 1.2, HP provides a firmware update to convert to TPM 2.0.

Talos installer

The Secure Boot installation image must be obtained from the Talos Image Factory.

On first boot of the installer, use the Enroll Secure Boot keys: auto option in the boot options. Once applied, the node will verify it is running in Secure Boot mode from the dashboard as well as with the talosctl get securitystate command.

Image Factory schematic

The Talos Image Factory generates and signs images with a configurable set of extensions and kernel parameters. The following customization generates the schematic ID of a13c1e1cdb9e135b5ae8ca3e977a5bee91bb4a503493d9204b6433239f462799 used in the cluster:

customization:
  systemExtensions:
    officialExtensions:
    - siderolabs/drbd
    - siderolabs/i915-ucode
    - siderolabs/intel-ucode

Network configuration

The nodes are configured for DHCP and configured with reservations from the upstream server. The nodes are configured to share a virtual IP which is used to ensure highly available access to the Kubernetes API.

CNI configuration

By default, Talos installs Flannel as the cluster's CNI. This repository depends on Cilium. Cilium cannot be installed directly by the Talos installer. Instead, the cluster is created with no CNI and then manually bootstrapped with Cilium.

Control plane scheduling

For high-availability of the Kubernetes API, but also to limit the required number of nodes, talconfig.yaml configures all three nodes as control plane nodes, but allows scheduling workloads on them.

💡 Note: While this is not strictly best practice, the alternative is losing high availability or purchasing additional worker nodes.

Pod security

The default Pod Security Standards profile is hardened to the restricted profile with a configuration patch in talconfig.yaml. This is increased from the default baseline set by Talos.