/homelab-kube-cluster

Dan's Homelab Kubernetes Cluster - Operated through Kustomize & ArgoCD

Primary LanguageTypeScriptCreative Commons Zero v1.0 UniversalCC0-1.0

Dan Manners' Homelab

All of the readme's are in a state of flux at this moment. I'm working on refactoring much of the repository, but I'm happy to answer any questions in the k8s@Home Discord server. Ping me at danmanners@3077 with any questions!

Current status: BETA (but is highly stable)

This project aims to utilize industry-standard tooling and practices in order to both perform it's functions and act as a repository for people to reference for their own learning and work.

🔍 Features

  • Easy to replicate GitOps
  • Modularity; make it easy to add/remove components
  • Hybrid Multi-Cloud
  • External DNS updates
  • Automagic cert management
  • In-Cluster Container Registry
  • Monitoring and alerting 🚧

💡 Current Tech Stack

Name Description
ArgoCD GitOps for Kubernetes
AWS Cloud Provider
Blocky Fast and lightweight DNS proxy as ad-blocker
Buildah Container Building
Cert-Manager Certificate Manager
Cilium CNI utilizing eBPF for Observability and Security
CloudNativePG Kubernetes operator covering lifecycle of HA PostgreSQL Clusters
CSI-Driver-NFS Kubernetes NFS Driver for persistent storage
Dex Federated OIDC
External-DNS Configure and manage External DNS servers
GitHub Popular Code Management through Git
Grafana Metrics Visualization
Helm Kubernetes Package Management
Jenkins Open-Source Automation Server
Kubernetes Container Orchestration
Kyverno Kubernetes Native Policy Management
Let's Encrypt Free TLS certificates
Maddy Composable all-in-one mail server
MetalLB Kubernetes bare-metal Load Balancer
Microsoft Azure Cloud Provider
Mozilla SOPS Simple/Flexible Tool
Podman Container and Pod management
Prometheus Metrics and Data Collection
Python Python Programming Language
Raspberry Pi Baremetal ARM SoC Hardware!
Reloader Kubernetes controller to watch cm's and secrets and reloads pods
SonarQube Static code analysis
Sonatype Nexus-OSS Manage binaries and build artifacts
Talos Secure, immutable, and minimal Linux OS
Tekton Cloud-Native CI/CD
Terraform Open-Source Infrastructure-as-Code
Terragrunt Making Terraform DRY
Ubuntu Operating System
Uptime Kuma Fancy self-hosted system monitoring
Vaultwarden Unofficial Bitwarden compatible server written in Rust; formerly bitwarden_rs
WikiJS Open-Source Wiki/Documentation Service

Services Hosted

Name Description Path Relevant Link
Excalidraw Easy whiteboarding with excellent shortcuts! manifests/workloads/excalidraw GitHub - excalidraw/excalidraw
Jenkins OSS An older tool sir, but it checks out. manifests/workloads/jenkins-oss Website
Kube-Prometheus-Stack Easy to deploy Grafana, Prometheus rules, and the Prometheus Operator. manifests/workloads/kube-prometheus-stack-grafana GitHub - prometheus-community/helm-charts
Memegen The free and open source API to generate memes. manifests/workloads/memegen GitHub - jacebrowning/memegen
Node-Feature-Discovery Node feature discovery for Kubernetes manifests/workloads/node-feature-discovery GitHub - kubernetes-sigs/node-feature-discovery
OpenFaaS Serverless functions, made simple! manifests/workloads/openfaas-ingress Website
SonarQube OSS Code quality and code security manifests/workloads/sonarqube-oss Website
Spiderfoot Automated OSINT webcrawling manifests/workloads/spiderfoot Website
Traefik Cloud native application proxying; simplifying network complexity manifests/bootstrapping/traefik Website
WikiJS The most powerful and extensible open source Wiki software manifests/workloads/wikijs Website

Deployment Order of Operations

While this section is a Work-in-Progress, I'd like to provide some relevant information on core services that must be deployed and in which order.

  1. Talos Linux
  2. Cilium CNI
  3. MetalLB
  4. Cert-Manager
  5. External-DNS
  6. Traefik
  7. ArgoCD - Part One
  8. ArgoCD - Part Two

Identifying Problems, Troubleshooting Steps, and more

Below are a few things that may be beneficial to you when troubleshooting or getting things up and operational

Traffic is not getting from the edge (cloud) nodes to the on-prem cluster networking

You can validate that your remote traffic is or isn't making it on site by using dig inside of the netshoot container

kubectl run temp-troubleshooting \
  --rm -it -n default \
  --overrides='{"apiVersion":"v1","spec":{"nodeSelector":{"kubernetes.io/hostname":"talos-aws-grav01"}}}' \
  --pod-running-timeout 3m \
  --image=docker.io/nicolaka/netshoot:latest \
  --command -- /bin/bash

Then, you can validate that you can reach CoreDNS or another pod/service IP from your remote node.

If you can prove it is not working, you may want to restart all of Cilium:

kubectl rollout restart -n kube-system daemonset cilium

To-Do Items

  • Ensure that ALL services are tagged for the appropriate hardware (arm64 or amd64) to ensure runtime success
    • Alternatively, ensure that all containers are built for multi-architecture.
  • Ensure that ALL application and service subdirectories have READMEs explaining what they're doing and what someone else may need to modify for their own environment

Gratitude and Thanks

This README redesign was inspired by several other homelab repos, individuals, and communities.

Individuals


Communities


The DevOps Lounge

Discord

K8s-at-Home

Discord

Without the inspiration and help of these individuals and communities, I don't think my own project would be nearly as far. Make sure to check out their projects as well!