This is the source repository for my homelab Kubernetes infrastructure built from repurposed HP ProDesk 600 G3 Desktop Mini PCs. The cluster runs Talos Linux, a minimal and immutable operating system designed exclusively for Kubernetes. The purpose of this project is to both gain useful experience with Kubernetes, but also to build a platform for providing legitimate production-quality applications and services in my home.
The project is organized into different "stacks" of components based on their function in the cluster.
System stack components are fundamental to the cluster delivering core functionality including networking and persistent storage to applications.
system/argocd
- Continuous delivery of this repository using Argo CDsystem/cert-manager
- Manages PKI certificates in the cluster with the cert-manager and trust-manager projectssystem/cilium
- Cilium Container Network Interface (CNI) plugin handling all networking including access from outside the cluster using BGP.system/cloudflare-gateway
- Cloudflare Gateway for Cloudflare tunnels via cloudflared and the Gateway API.system/cnpg
- CloudNativePG operator for PostgreSQL databasessystem/crossplane
- Crossplane control plane for managing non-Kubernetes resourcessystem/envoy-gateway
- Envoy Gateway for handling ingress connections into the cluster via the Gateway API.system/external-secrets
- External Secrets Operator for secrets managementsystem/gateway-api
- Gateway API CRDs used by Envoy Gatewaysystem/k8s-gateway
- An outside-facing instance of CoreDNS with the k8s_gateway plugin for resolving DNS names from the LANsystem/keycloak
- Keycloak for identity and access management (IAM)system/kube-prometheus
- Kubernetes monitoring platform with the Prometheus operatorsystem/kubevirt
- Virtualization platform with KubeVirtsystem/kyverno
- Kyverno Kubernetes policy enginesystem/mariadb-operator
- MariaDB Operator for MariaDB databases.system/piraeus
- Container Storage Interface (CSI) plugin for LINSTOR managed with the Piraeus operatorsystem/velero
- Velero for Kubernetes resource and persistent volume backups
Application stack components provide usable functionality to end users and rely on components in the platform and system stacks.
apps/actual
- Actual Budget personal finance application practicing the envelope budgeting method.apps/ezxss
- ezXSS platform for testing for XSS vulnerabilities, particularly useful for blind XSS injections.apps/paperless-ngx
- Paperless-ngx document management system
Following Argo CD's app of apps pattern, apps for which Argo CD should sync are managed by an app themselves. This is controlled by the root
app which generates the desired manifests with Jsonnet to avoid duplication.
Secrets in this repository are encrypted with SOPS and applied via Kustomize with the KSOPS plugin. Secrets are readable with my personal AGE private key as well as a private key created for the production Argo CD deployment.
Tools for working with Talos Linux, Kubernetes, etc, are managed by this project's flake.nix
. With direnv, all tools are installed and ready to use when navigating to the folder in a terminal.
A rudimentary shell script, hack/dev-cluster.sh
, provisions a local development cluster using a containerized version of Talos
.
💡 Note: To successfully deploy LINSTOR in the development cluster, the host Linux system must have the DRBD 9 kernel module installed.
The production cluster's configuration is generated with talhelper genconfig
based on the talconfig.yaml
and talsecret.sops.yaml
files. talosctl
is used to apply configuration, upgrade Talos Linux, and upgrade the Kubernetes version on the cluster based on the configuration generated by talhelper
.
💡 Note: When booted from the installation media, the nodes will run in "maintenance" mode. Applying a configuration to them with
talosctl
will install Talos to the disk and attempt to join the cluster.
The first node of the etcd
cluster must be bootstrapped manually with talosctl
. Other nodes will then automatically join the cluster based on their applied configuration.
Working Secure Boot is required to enable secure TPM-backed disk encryption.
The nodes must be prepared to accept the Secure Boot keys provided by the Talos installer. The UEFI firmware must be configured to clear all existing Secure Boot keys to allow the Talos installer to apply Sidero's platform key to the system.
💡 Note: It is important to retain the Microsoft UEFI CA certificate in the signature database to continue to allow option ROMs (such as for display adapters) to load. On HP systems specifically, failing to do so will prevent normal access to the UEFI firmware interface.
Talos also requires TPM 2.0 to support TPM-backed disk encryption. While the HP ProDesk 600 G3 ships with TPM 1.2, HP provides a firmware update to convert to TPM 2.0.
The Secure Boot installation image must be obtained from the Talos Image Factory.
On first boot of the installer, use the Enroll Secure Boot keys: auto
option in the boot options. Once applied, the node will verify it is running in Secure Boot mode from the dashboard as well as with the talosctl get securitystate
command.
The Talos Image Factory generates and signs images with a configurable set of extensions and kernel parameters. The following customization generates the schematic ID of a13c1e1cdb9e135b5ae8ca3e977a5bee91bb4a503493d9204b6433239f462799
used in the cluster:
customization:
systemExtensions:
officialExtensions:
- siderolabs/drbd
- siderolabs/i915-ucode
- siderolabs/intel-ucode
The nodes are configured for DHCP and configured with reservations from the upstream server. The nodes are configured to share a virtual IP which is used to ensure highly available access to the Kubernetes API.
By default, Talos installs Flannel as the cluster's CNI. This repository depends on Cilium. Cilium cannot be installed directly by the Talos installer. Instead, the cluster is created with no CNI and then manually bootstrapped with Cilium.
For high-availability of the Kubernetes API, but also to limit the required number of nodes, talconfig.yaml
configures all three nodes as control plane nodes, but allows scheduling workloads on them.
💡 Note: While this is not strictly best practice, the alternative is losing high availability or purchasing additional worker nodes.
The default Pod Security Standards profile is hardened to the restricted
profile with a configuration patch in talconfig.yaml
. This is increased from the default baseline
set by Talos.