/terraform-openstack-rke2

Easily deploy a high-availability RKE2 Kubernetes cluster on OpenStack providers like Infomaniak.

Primary LanguageHCLMozilla Public License 2.0MPL-2.0

Terraform RKE2 OpenStack

Terraform Registry

Easily deploy a high-availability RKE2 Kubernetes cluster on OpenStack providers (e.g. Infomaniak, OVH, etc.). This project aims at offering a simple and stable distribution rather than supporting all configuration possibilities.

Inspired and reworked from remche/terraform-openstack-rke2 to add an easier interface, high-availability, load-balancing and sensible defaults for running production workload.

Features

  • RKE2 Kubernetes distribution : lightweight, stable, simple and secure
  • persisted /var/lib/rancher/rke2 when there is a single server
  • automated etcd snapshots with Openstack Swift support or other S3-like backend
  • smooth updates & agent nodes autoremoval with pod draining
  • integrated Openstack Cloud Controller (load-balancer, etc.) and Cinder CSI
  • Cilium networking (network policy support and no kube-proxy)
  • highly-available via kube-vip and dynamic peering (no load-balancer required)
  • out of the box support for volume snapshot and Velero

Versioning

Component Version
OpenStack 2023.1 Antelope (verified), maybe older version are supported too
RKE2 v1.29.0+rke2r1
OpenStack Cloud Controller v1.28.1
OpenStack Cinder v1.28.1
Velero v6.0.0
Kube-vip v0.7.2

Getting started

git clone git@github.com:zifeo/terraform-openstack-rke2.git && cd terraform-openstack-rke2/examples/single-server
cat <<EOF > terraform.tfvars
project=PCP-XXXXXXXX
username=PCU-XXXXXXXX
password=XXXXXXXX
EOF

terraform init
terraform apply # approx 2-3 mins
kubectl --kubeconfig single-server.rke2.yaml get nodes
# NAME           STATUS   ROLES                       AGE     VERSION
# k8s-pool-a-1   Ready    <none>                      119s    v1.21.5+rke2r2
# k8s-server-1   Ready    control-plane,etcd,master   2m22s   v1.21.5+rke2r2

# get SSH and restore helpers
terraform output -json

# on upgrade, process node pool by node pool
terraform apply -target='module.rke2.module.servers["server-a"]'

See examples for more options or this article for a step-by-step tutorial.

Note: it requires rsync and yq to generate remote kubeconfig file. You can disable this behavior by setting ff_write_kubeconfig=false and fetch yourself /etc/rancher/rke2/rke2.yaml on server nodes.

Restoring a backup

# remove server url from rke2 config
sudo vim /etc/rancher/rke2/config.yaml
# ssh into one of the server nodes (see terraform output -json)
# restore s3 snapshot (see restore_cmd output of the terraform module):
sudo systemctl stop rke2-server
sudo rke2 server --cluster-reset --etcd-s3 --etcd-s3-bucket=BUCKET_NAME --etcd-s3-access-key=ACCESS_KEY --etcd-s3-secret-key=SECRET_KEY --cluster-reset-restore-path=SNAPSHOT_PATH
sudo systemctl start rke2-server
# exit and ssh on the other server nodes to remove the etcd db
# (recall that you may need to ssh into one node as a bastion then to the others):
sudo systemctl stop rke2-server
sudo rm -rf /var/lib/rancher/rke2/server
sudo systemctl start rke2-server
# reboot all nodes one by one to make sure all is stable
sudo reboot

Infomaniak OpenStack

A stable, performant and fully equipped Kubernetes cluster in Switzerland for as little as CHF 18.—/month (at the time of writing):

  • 1 server 2cpu/4Go (= master)
  • 1 agent 1cpu/2Go (= worker)
  • 1 floating IP for admin access (ssh and kubernetes api)
  • 1 floating IP for private network gateway
Flavour CHF/month
5.88 + 2.93 (instances) + 0.09×2×(6+8) (block storage) + 2×3.34 (IP) 18.—
1x2cpu/4go server with 1x4cpu/16Go worker ~28.—
3x2cpu/4go HA servers with 1x4cpu/16Go worker ~41.—
3x2cpu/4go HA servers with 3x4cpu/16Go workers ~76.—

You may also want to add a load-balancer and bind an additional floating IP for public access (e.g. for an ingress controller like ingress-nginx), that will add 10.00 (load-balancer) + 3.34 (IP) = CHF 13.34/month. Note that physical load-balancer can be shared by many Kubernetes load-balancers when there is no port collision.

See their technical documentation and pricing.

More on RKE2 & OpenStack

RKE2 cheat sheet

# alias already set on the nodes
crictl
kubectl (server only)

# logs
sudo systemctl status rke2-server.service
journalctl -f -u rke2-server

sudo systemctl status rke2-agent.service
journalctl -f -u rke2-agent

less /var/lib/rancher/rke2/agent/logs/kubelet.log
less /var/lib/rancher/rke2/agent/containerd/containerd.log
less /var/log/cloud-init-output.log

# check san
openssl s_client -connect 192.168.42.3:10250 </dev/null 2>/dev/null | openssl x509 -inform pem -text

# defrag etcd
kubectl -n kube-system exec $(kubectl -n kube-system get pod -l component=etcd --no-headers -o custom-columns=NAME:.metadata.name | head -1) -- sh -c "ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl defrag --cluster"

# increase volume size
# shutdown instance
# detach volumne
# expand volume
# recreate node
terraform apply -target='module.rke2.module.servers["server"]' -replace='module.rke2.module.servers["server"].openstack_compute_instance_v2.instance[0]'

Migration guide

From v2 to v3

# 1. use the previous patch version (2.0.7) to setup an additional san for 192.168.42.4
# this will become the new VIP inside the cluster and replace the load-balancer:
source  = "zifeo/rke2/openstack"
version = "2.0.7"
# ...
additional_san = ["192.168.42.4"]
# 2. run an full upgrade with it, node by node:
terraform apply -target='module.rke2.module.servers["your-server-pool"]'
# 3. you can now switch to the new major and remove the additional_san:
source  = "zifeo/rke2/openstack"
version = "3.0.0"
# 4. create the new external IP for admin access (that will be different from the load-balancer) with:
terraform apply -target='module.rke2.openstack_networking_floatingip_associate_v2.fip'
# 5. pick a server different from the initial one (used to bootstrap):
terraform apply -target='module.rke2.module.servers["server-c"].openstack_networking_port_v2.port'
# 6. give to that server the control of the VIP
ssh ubuntu@server-c
sudo su
modprobe ip_vs
modprobe ip_vs_rr
cat <<EOF > /var/lib/rancher/rke2/agent/pod-manifests/kube-vip.yaml
apiVersion: v1
kind: Pod
metadata:
  name: kube-vip
  namespace: kube-system
spec:
  containers:
    - name: kube-vip
      image: ghcr.io/kube-vip/kube-vip:v0.7.2
      imagePullPolicy: IfNotPresent
      args:
        - manager
      env:
        - name: vip_arp
          value: "true"
        - name: port
          value: "6443"
        - name: vip_interface
          value: ens3
        - name: vip_cidr
          value: "32"
        - name: cp_enable
          value: "true"
        - name: cp_namespace
          value: kube-system
        - name: vip_ddns
          value: "false"
        - name: svc_enable
          value: "false"
        - name: vip_leaderelection
          value: "true"
        - name: vip_leasename
          value: plndr-cp-lock
        - name: vip_leaseduration
          value: "15"
        - name: vip_renewdeadline
          value: "10"
        - name: vip_retryperiod
          value: "2"
        - name: enable_node_labeling
          value: "true"
        - name: lb_enable
          value: "true"
        - name: lb_port
          value: "6443"
        - name: lb_fwdmethod
          value: local
        - name: address
          value: 192.168.42.4
        - name: prometheus_server
          value: ":2112"
      resources:
        requests:
          cpu: 100m
          memory: 64Mi
        limits:
          memory: 64Mi
      securityContext:
        capabilities:
          add:
            - NET_ADMIN
            - NET_RAW
      volumeMounts:
        - mountPath: /etc/kubernetes/admin.conf
          name: kubeconfig
  restartPolicy: Always
  hostAliases:
    - hostnames:
        - kubernetes
      ip: 127.0.0.1
  hostNetwork: true
  volumes:
    - name: kubeconfig
      hostPath:
        path: /etc/rancher/rke2/rke2.yaml
EOF
# 7. you should see a pod in kube-system starting with kube-vip (investigate if failling)
# then apply the migration to the initial/bootstraping server:
terraform apply -target='module.rke2.module.servers["server-a"]'
terraform apply -target='module.rke2.openstack_networking_secgroup_rule_v2.outside_servers'
# 8. the cluster IP has now changed, and you should update your kubeconfig with the new ip (look in horizon)
# 9. import the load-balancer and its ip elsewhere if used (otherwise they will be destroyed)
cat <<EOF > lb.tf
resource "openstack_lb_loadbalancer_v2" "lb" {
  name                  = "lb"
  vip_network_id        = module.rke2.network_id
  vip_subnet_id         = module.rke2.lb_subnet_id
  lifecycle {
    ignore_changes = [
      tags
    ]
  }
}
resource "openstack_networking_floatingip_v2" "external" {
  pool    = "ext-floating1"
  port_id = openstack_lb_loadbalancer_v2.lb.vip_port_id
}
EOF
terraform state show module.rke2.openstack_lb_loadbalancer_v2.lb
terraform import openstack_lb_loadbalancer_v2.lb ID
terraform state rm module.rke2.openstack_lb_loadbalancer_v2.lb
terraform state show module.rke2.openstack_networking_floatingip_v2.external
terraform import openstack_networking_floatingip_v2.external ID
terraform state rm module.rke2.openstack_networking_floatingip_v2.external
# 10. continues by upgrading other nodes step-by-step as you would do it normally:
terraform apply -target='module.rke2.module.POOL["NODE"]'
# 11. once all the nodes are upgraded, make sure that everything is well applied:
terraform apply