/vsphere_rancher_cluster

Terraform plan for creating a hardened multi-node RKE2 cluster on VMware vSphere

Primary LanguageHCLMozilla Public License 2.0MPL-2.0

RKE2 Cluster with vSphere CPI/CSI & kube-vip

Rancher Terraform Kubernetes

Reason for Being

This Terraform plan is for creating a multi-node CIS Benchmarked RKE2 cluster with vSphere CPI/CSI & kube-vip installed and configured. RKE2's NGiNX Ingress Controller is also exposed as a LoadBalancer service to work in concert with kube-vip. Along with those quality-of-life additions, this cluster plan takes the standard RKE2 security posture a couple of steps further by way of installing with CIS 1.23 Profile enabled, using Calico's Wireguard backend for encrypting pod-to-pod communication, & enforcing the use TLS 1.3 across Control Plane components.

There is a lot of HereDoc in the rke_config section of cluster.tf so that it's easier to see what's going on - you'll probably want to put this info in a template file to keep the plan a bit neater than what's seen here.

Some operating systems will run containerd within the "systemd" control group and the Kubelet within the "cgroupfs" control group - this plan passes to the Kubelet a --cgroup-driver=systemd argument to ensure that there will be only a single cgroup manager running - better aligining the cluster with upstream K8s reccomendations ( see: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers).

Static IP Addressing

Static IPs can be implemented if needed. Firstly, a Network Protocol Profile needs to be created in vSphere. After the profile is created, two parts of this Terraform plan need to be changed: cloud-init and the rancher2_machine_config_v2 resource in cluster.tf.

  1. A script must be added with write_files and executed via runcmd in cloud-init. This script gathers instance metadata, via vmtools, and then applies it (the below example uses Netplan. Your OS, however, may use something different):
- content: |
  #!/bin/bash
  vmtoolsd --cmd 'info-get guestinfo.ovfEnv' > /tmp/ovfenv
  IPAddress=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.address" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
  SubnetMask=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.netmask" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
  Gateway=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.route.0.gateway" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
  DNS=$(sed -n 's/.*Property oe:key="guestinfo.dns.servers" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)

  cat > /etc/netplan/01-netcfg.yaml <<EOF
  network:
    version: 2
    renderer: networkd
    ethernets:
      ens192:
        addresses: 
          - $IPAddress/24
        gateway4: $Gateway
        nameservers:
          addresses : [$DNS]
  EOF

  netplan apply

path: /root/netplan.sh
  1. The below additions need to be made to rancher2_machine_config_v2. This example would apply static IPv4 addresses to only the ctl_plane node pool:
vapp_ip_allocation_policy = each.key == "ctl_plane" ? "fixedAllocated" : null
vapp_ip_protocol          = each.key == "ctl_plane" ? "IPv4" : null
vapp_property = each.key == "ctl_plane" ? [
  "guestinfo.interface.0.ip.0.address=ip:<vSwitch_from_Network_Protocol_Profile>",
  "guestinfo.interface.0.ip.0.netmask=$${netmask:<vSwitch_from_Network_Protocol_Profile>}",
  "guestinfo.interface.0.route.0.gateway=$${gateway:<vSwitch_from_Network_Protocol_Profile>}",
  "guestinfo.dns.servers=$${dns:<vSwitch_from_Network_Protocol_Profile>}",
] : null
vapp_transport = each.key == "ctl_plane" ? "com.vmware.guestInfo" : null

Using static IPs comes with some small caveats:

  • In leu of "traditional" cloud-init logic to handle OS updates/upgrades & package installs:
package_reboot_if_required: true
package_update: true
package_upgrade: true
packages:
  - <insert_awesome_package_name_here>

Scripting would need to be introduced to take care of this later on in the cloud-init process, if desired (i.e. a write_file using defer: true). Since runcmd happens later in the cloud-init process, the node would not have an IP available to successfully complete any package* logic requiring network access.

Environment Prerequisites

  • Functional Rancher Management Server with vSphere Cloud Credential

  • vCenter >= 7.x and credentials with appropriate permissions (see vSphere Permissions section)

  • Virtual Machine Hardware Compatibility at Version >= 15

  • Create the following in the files/ directory:

    NAME PURPOSE
    .rancher-api-url URL for Rancher Management Server
    .rancher-bearer-token API bearer token generated via Rancher UI
    .ssh-public-key SSH public key for additional OS user
    .vsphere-passwd Password associated with vSphere CPI/CSI credential

vSphere Permissions

For required vSphere CPI & CSI account permissions see HERE.

Caveats

To Run

terraform apply

Tested Versions

SOFTWARE VERSION DOCS
kube-vip 0.6.2 https://kube-vip.io/docs/
Rancher Server 2.7.6 https://ranchermanager.docs.rancher.com/
Rancher Terraform Provider 3.1.1 https://registry.terraform.io/providers/rancher/rancher2/latest/docs
RKE2 1.26.8+rke2r1 https://docs.rke2.io
Terraform 1.4.6 https://www.terraform.io/docs
vSphere 8.0.1.00300 https://docs.vmware.com/en/VMware-vSphere/index.html