/ceil

Helmut Hoffer von Ankershoffen experimenting with auto-provisioned RPi cluster running K8S on bare-metal

Primary LanguageShellGNU General Public License v3.0GPL-3.0

ceil: Auto-provisioned RPi cluster running K8S on bare-metal

Enter make help to see available commands.

Why the name? intval(ceil(M_PI)) === 4 which is the number of k8s nodes of the ceil cluster - flowers to mlande for gifting the name.

See branch max for the Mini PC (amd64) variant.

Author: Helmut Hoffer von Ankershoffen né Oertel

Goals

  • Setup auto-provisioned RPi cluster running K8S on bare-metal behind a RPi acting as a router
  • Educate myself on Ansible + RPi + K8S + GitOps for CI/CD/PD from bottom to top
  • Refresh knowledge regarding networking and Python
  • Enhanced PHP/SF4 stack for K8S supporting HPA, progressive deployments and a/b testing

Tasks

Phase 0: Hardware

alt text

  • Wire up RPi rack and accessories

Phase 1: Foundation

  • Central CloudOps entrypoint is make
  • Flashing of RPis and automatic provisioning with pre-configured base OS
  • Setup and teardown of all steps individually
  • Setup and teardown in one step
  • Setup of k8s cluster on RPis using Ansible inc. weave networking and k8s dashboard
  • Helm/tiller for additional deployments
  • Traefik as ingress inc. Traefik dashboard
  • busybox-http using Traefik as ingress for demos
  • Grafana and prometheus

Phase 2: Storage and Loadbalancing

  • Dynamic volume provisioning using Heketi + GlusterFS spanning thumb drives
  • Enabled persistence for grafana and prometheus
  • MetalLB as LoadBalancer service

Phase 3: Router

  • Act as DHCP client using dhcpcd
  • Act as DHCP & DNS server for K8S subnet using dnsmasq
  • Act as gateway from wlan0 (WiFi) to eth0 (K8S subnet) using iptables
  • Act as VPN server using OpenVPN
  • Dynamically update domain vpn.ceil.pro (or similar) using ddclient and Cloudflare v4 API
  • Raise Firewall using ufw
  • Act as Docker registry mirror using official docker image registry:2
  • Act as private Docker registry
  • kail and harbor
  • ngrok

Phase 4: PiWatch

  • Play with PiTraffic Lights mounted on top of ceil-router
  • Deploy kubewatch to push K8S events to arbitrary webhook
  • Build dockerized Python/FastAPI (ASGI) based webapp PiWatch triggering PiTraffic as audiovisual event handler for K8S by providing webhook for kubewatch
  • Refine PiWatch to react more fine granular to specific K8S events

Phase 5: PiPHP

  • Deploy custom built base image arm32v7-docker-php-apache to k8s from private registry provided by router. Further progress of the base image tracked in respective repository.
  • Prepare Helmuts Helm Chart Repository hosted on ghpages.
  • Prepare PiPHP docker image based on said base image inc. helm chart and redeploy. Further progress of said app tracked in said repository.
  • Automate build->deploy workflow inc. helming locally.
  • Automate full CI/CD workflow with GitHub Actions or similar.

Phase 6: Auto-Scaling

  • Autoscaling using HPA and custom metrics
  • Zero-Scaling using Osiris
  • Relevant dashboards in grafana

Phase 7: Mesh-Networking (waiting for ARM images from CNCF et al)

  • Istio for Mesh-Networking
  • Visibility tools
  • Additional tools

Phase 8: GitOps and Progressive Delivery (waiting for ARM images from CNCF et al)

  • Flagger for Helm using mesh network
  • Canary deployments using mesh network
  • ...

Phase 9: CI and emphemeral test environments (waiting for ARM images from CNCF et al)

  • Setup CI using JenkinsX
  • ...

Phase 10: A/B testing (waiting for ARM images from CNCF et al)

  • Using mesh network
  • ...

Phase 11: Sharing is caring

  • Open source under GPlv3
  • Links to useful material for further studies
  • GitHub Page
  • Prepare interactive install script automating the step to manually copy and edit .tpl files
  • Write a series of blog posts
  • Prepare a workshop presentation
  • Educate peers in meetups

Layers and tools

  • CloudOps
    • Workstation: MacBook Pro
    • Package manager: Homebrew
    • Flash-Tool for OS of RPis: Hypriot Flash
    • Entrypoints: make and kubectl (GitOps in second step)
  • Hardware
    • SBCs: 5x Raspberry Pi 3B+
    • Storage: 5x 128GiB SD cards (containers), 5x 128GiB USB ThumbDrives (volumes)
    • Rack: transparent
    • Networking: 5-port GBit/s switch + WiFi router connected to router
    • Power: 6-port USB charger powering switch and RPIs
    • 4-dir traffic lights with beeper and button: PiTraffic
  • Software
    • OS: Debian, Hypriot distribution
    • Networking for router: iptables, dhcpcd, dnsmasq, OpenVPN, ddclient, CloudFlare
    • Configuration management: Ansible
    • Orchestration: Kubernetes (K8S)
    • K8S installation: kadm
    • Networking: weave
    • Persistence: GlusterFS + Heketi for dynamic volume provisioning
    • Ingresss: Traefik
    • Loadbalancer: MetaLB
    • Deployments: helm
    • Monitoring and Dashboarding: prometheus, grafana
    • Traffic lights: kubewatch, Python, Flask, PiTraffic, RPi.GPIO

Install this repository

  1. Fork this repository and clone to your workstation
  2. Walk all files with suffix .tpl, create a copy in the same directory without said suffix and enter specifics where invited by capital letters

Provision RPIs

  1. Prepare you workstation by installing Ansible, kubectl, helm etc. using homebrew: make prepare-mac
  2. Pull the hypriot image (which is not stored in GitHub): make pull-image
  3. Flash RPIs (insert SD cards in your workstation): make {router,one,two,three,four}-provision
  4. Insert SD cards into slots of respective RPIs
  5. Insert thumb drives into USB ports of RPIs
  6. Start RPIs by plugging in the USB charger

Setup router

  1. Make a DHCP reservation for ceil-router on your home or company WiFi router with IP address 192.168.0.100 - it will register as ceil-router at your WiFi router
  2. Set up a static route to the k8s subnet 11.0.0.0 with 192.168.0.100 as gateway in your company or home wifi router - if this is not achievable use make workstation-route-add to add a route on your workstation.
  3. For VPN setup port forwarding (sometimes called "virtual server") in your company or home wifi router for port 1194 (or whatever you configured in router/roles/vpn/defaults/main.yml) to 192.168.0.100
  4. Add 192.168.0.100 as the first nameserver for the (WiFi) connection of your workstation using system settings
  5. Reboot ceil-router to pickup its IP address via make router-reboot - it will register via ZeroConf/Avahi on your workstation as ceil-router.local
  6. Check via make router-check-ip if the IP address has been picked up
  7. Setup networking services on router using make router-setup
  8. Wait for 1 minute than check if the k8s nodes (ceil-{one,two,three,four}.dev) have picked up their designated IP addresses from the router in the range 11.0.0.101 to 11.0.0.104: make k8s-check-ip

Notes:

  • Danger: wipes thumb drive in router
  • It might take some time until the Zeroconf/Avahi distributed the name ceil-router.local in your network. You can check by ssh'ing into the router via make router-ssh
  • The router will manage / route to the subnet 11.0.0.[0-128] (11/25) the K8S nodes will life in and act as their DHCP and DNS server
  • Furthermore the router acts as an OpenVPN server and updates the IP address of vpn.ceil.pro via DDNS
  • After setting up the router wait for a minute to check if the k8s nodes have picked up the designated IPs using make k8s-check-ip
  • After the k8s nodes picked up their IP addresses you can ssh into them using make {one,two,three,four}-ssh
  • If on your workstation nslookup ceil-{one,two,three.four}.dev works but ping ceil-{one,two,three.four}.dev does not, reestablish the (WiFi) connection of your workstation
  • If you want to play with the traffic lights mounted on top of the router: make router-traffic
  • The last step of the router setup is building PiWatch which takes ca. 15 minutes for the 1st build
  • Last but not least the router provides a docker registry mirror and private docker registry consumed by the K8S nodes

Setup K8S and execute all deployments

  1. Execute make setup to setup K8S inc. persistence and deploy everything at once - takes ca. 45 minutes.

Notes:

  • ceil-one is set up as k8s master
  • Danger: wipes thumb drives for setting up GlusterFS.
  • Because of memory constraints the GlusterFS spans ceil-two to ceil-four but not ceil-one

Alternatively you can execute the setup and deploy steps one-by-one as described below

Interact, open dashboards and UIs

  1. Establish proxy to cluster (leave open in separate terminal): make k8s-proxy
  2. List nodes: make nodes-show
  3. List pods: make pods-show
  4. Generate bearer token for accessing K8S dashboard: make k8s-dashboard-bearer-token-show
  5. Access K8S dashboard in your browser and enter token: make k8s-dashboard-open
  6. Open Traefik UI in your browser: make traefik-ui-open
  7. Show webpage in your browser: make httpd-open
  8. Open Prometheus UI in your browser: make prometheus-open
  9. Open Grafana dashboards in your browser: make grafana-open

Notes:

  • Add the contents of workstation/etc/hosts to /etc/hosts of your workstation for steps 6 to 9

Setup K8S inc. persistence and helm/tiller

  1. Setup K8S cluster inc. persistence via GlusterFS+Heketi and helm/tiller for later deployments: make k8s-setup.

Notes:

  • ceil-one is set up as k8s master
  • Danger: wipes thumb drives for setting up GlusterFS.
  • Because of memory constraints the GlusterFS spans ceil-two to ceil-four but not ceil-one

Deploy

  1. Execute all deployments using make all-deploy or deploy step by step as documented below.
  2. Interact, open dashboards and UIs as documented above.

Delete deployments

  1. All deployments provide an individual make target for deleting the deployment, e.g. ngrok-delete. Execute make help to see all commands.
  2. Execute make all-delete to delete all deployments at once

Remove K8S inc. persistence and helm/tiller

  1. Execute make k8s-remove.

Teardown

  1. Execute make teardown to delete all deployments and remove K8S.

Obstacles

  • Examples for setting up K8S on bare metal mostly outdated and/or incomplete or making undocumented assumptions or not using Ansible correctly => full rewrite
  • Current Kernel of hypriot does not setup pid cgroup which is used by newer K8S for QoS => downgrade K8S
  • RBAC is rather new and not yet accounted for in deployment procedures of all tools and services => amend
  • Traefik image of hypriot outdated, dashboard not useable => use original image given manifest lists
  • Some services do not yet compile docker images for ARM and/or do not use docker manifest lists properly => google for alternative images or wait for CNCF
  • Most ansible playbooks do not provide a teardown role => build yourself

Additional references