This terraform module will install a High Availability K3s Cluster with Embedded DB in a private network on Hetzner Cloud. The following resources are provisionised by default (20€/mo):
- 3x Control-plane: CX11, 2GB RAM, 1VCPU, 20GB NVMe, 20TB Traffic.
- 2x Worker: CX21, 4GB RAM, 2VCPU, 40GB NVMe, 20TB Traffic.
- Network: Private network with one subnet.
- Server and agent nodes are distributed across 3 Datacenters (nbg1, fsn1, hel1) for high availability.
Hetzner Cloud integration:
- Preinstalled CSI-driver for volume support.
- Preinstalled Cloud Controller Manager for Hetzner Cloud for Load Balancer support.
Auto-K3s-Upgrades
Enable the upgrade-controller (enable_upgrade_controller = true
) and specify
your target k3s version (upgrade_k3s_target_version
). See
here for possible versions.
Label the nodes you want to upgrade, e.g.
kubectl label nodes core-control-plane-1 k3s-upgrade=true
. The concurrency of
the upgrade plan is set to 1, so you can also label them all at once. Agent
nodes will be drained one by one during the upgrade.
You can label all control-plane nodes by using
kubectl label nodes -l node-role.kubernetes.io/control-plane=true k3s-upgrade=true
.
All agent nodes can be labelled using
kubectl label nodes -l node-role.kubernetes.io/control-plane=true k3s-upgrade!=true
.
To remove the label from all nodes you can run
kubectl label nodes --all k3s-upgrade-
.
After a successful update you can also remove the upgrade controller and the
plans again, setting enable_upgrade_controller
to false
.
What is K3s?
K3s is a lightweight certified kubernetes distribution. It's packaged as single binary and comes with solid defaults for storage and networking but we replaced local-path-provisioner with hetzner CSI-driver and klipper load-balancer with hetzner Cloud Controller Manager. The default ingress controller (traefik) has been disabled.
See a more detailed example with walk-through in the example folder.
Name | Description | Type | Default | Required |
---|---|---|---|---|
agent_groups | Configuration of agent groups | map(object({ |
{ |
no |
cluster_cidr | Network CIDR to use for pod IPs | string |
"10.42.0.0/16" |
no |
control_plane_server_count | Number of control plane nodes | number |
3 |
no |
control_plane_server_type | Server type of control plane servers | string |
"cx11" |
no |
create_kubeconfig | Create a local kubeconfig file to connect to the cluster | bool |
true |
no |
enable_upgrade_controller | Install the rancher system-upgrade-controller | bool |
false |
no |
hcloud_csi_driver_version | n/a | string |
"v1.6.0" |
no |
hcloud_token | Token to authenticate against Hetzner Cloud | any |
n/a | yes |
k3s_version | K3s version | string |
"v1.21.3+k3s1" |
no |
kubeconfig_filename | Specify the filename of the created kubeconfig file (defaults to kubeconfig-${var.name}.yaml | any |
null |
no |
cluster_name | Cluster name (used in various places, don't use special chars) | any |
n/a | yes |
network_cidr | Network in which the cluster will be placed. Ignored if network_id is defined | string |
"10.0.0.0/16" |
no |
network_id | If specified, no new network will be created. Make sure cluster_cidr and service_cidr don't collide with anything in the existing network. | any |
null |
no |
server_additional_packages | Additional packages which will be installed on node creation | list(string) |
[] |
no |
server_locations | Server locations in which servers will be distributed | list(string) |
[ |
no |
service_cidr | Network CIDR to use for services IPs | string |
"10.43.0.0/16" |
no |
ssh_keys | List of ssh key resource IDs that will be installed on the servers | list(string) |
n/a | yes |
subnet_cidr | Subnet in which all nodes are placed | string |
"10.0.1.0/24" |
no |
upgrade_controller_image_tag | The image tag of the upgrade controller (See https://github.com/rancher/system-upgrade-controller/releases) | string |
"v0.8.0" |
no |
upgrade_k3s_target_version | Target version of k3s (See https://github.com/k3s-io/k3s/releases) | string |
null |
no |
upgrade_node_additional_tolerations | List of tolerations which upgrade jobs must have to run on every node (for control-plane and agents) | list(map(any)) |
[] |
no |
Name | Description |
---|---|
agents_public_ips | The public IP addresses of the agent servers |
cidr_block | n/a |
control_planes_public_ips | The public IP addresses of the control plane servers |
k3s_token | Secret k3s authentication token |
kubeconfig | Structured kubeconfig data to supply to other providers |
kubeconfig_file | Kubeconfig file content with external IP address |
network_id | n/a |
server_locations | Array of hetzner server locations we deploy to |
subnet_id | n/a |
If you need to cycle an agent, you can do that with a single node following this procedure. Replace the group name and number with the server you want to recreate!
Make sure you drain the nodes first.
terraform taint 'module.my_cluster.module.agent_group["GROUP_NAME"].random_pet.agent_suffix[1]'
terraform apply
This will recreate the agent in that group on next apply.
Currently you should only replace the servers which didn't initialize the cluster.
terraform taint 'module.my_cluster.hcloud_server.control_plane["#1"]'
terraform apply
Install the system-upgrade-controller in your cluster.
KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/controller.yaml
- Mark the nodes you want to upgrade (The script will mark all nodes).
KUBECONFIG=kubeconfig.yaml kubectl label --all node k3s-upgrade=true
- Run the plan for the servers.
KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/server-plan.yaml
Warning: Wait for completion before you start upgrading your agents.
- Run the plan for the agents.
KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/agent-plan.yaml
K3s will automatically backup your embedded etcd datastore every 12 hours to
/var/lib/rancher/k3s/server/db/snapshots/
. You can reset the cluster by
pointing to a specific snapshot.
- Stop the master server.
sudo systemctl stop k3s
- Restore the master server with a snapshot
./k3s server \
--cluster-reset \
--cluster-reset-restore-path=<PATH-TO-SNAPSHOT>
Warning: This forget all peers and the server becomes the sole member of a new cluster. You have to manually rejoin all servers.
- Connect you with the different servers backup and delete
/var/lib/rancher/k3s/server/db
on each peer etcd server and rejoin the nodes.
sudo systemctl stop k3s
rm -rf /var/lib/rancher/k3s/server/db
sudo systemctl start k3s
This will rejoin the server with the master server and seed the etcd store.
Info: It exists no official tool to automate the procedure. In future, rancher might provide an operator to handle this (issue).
Cloud init logs can be found on the remote machines in:
/var/log/cloud-init-output.log
/var/log/cloud-init.log
journalctl -u k3s.service -e
last logs of the serverjournalctl -u k3s-agent.service -e
last logs of the agent
- Sometimes at cluster bootstrapping the Cloud-Controllers reports that some routes couldn't be created. This issue was fixed in master but wasn't released yet. Restart the cloud-controller pod and it will recreate them.
- terraform-hcloud-k3s Terraform module which creates a single node cluster.
- terraform-module-k3 Terraform module which creates a k3s cluster, with multi-server and management features.
- Icon created by Freepik from www.flaticon.com