+-----------+
| Login |
| Node |
+-----|-----+
|
v
+-----------|-----------+
| | |
+----------|--------+ | +-------|----------+
| Master Node 1 | | | Master Node 2 |
| (Control Plane) | | | (Control Plane) |
+----------|--------+ | +------------------+
| | | |
+-----------|---|-------+
| |
v v
+-------------+ +-------------+
| Worker | | Worker |
| Node 1 | | Node 2 |
+-------------+ +-------------+
|
v
+-------------+
| Worker |
| Node 3 |
+-------------+
This guide was modified based on https://github.com/sifulan-access-federation/ifirexman-training guide on Rocky Linux 8 OS. [ ;) Thanks Sifu Farhan ]
In this tutorial, we are going to setup a Kubernetes cluster by using the Rancher Kubernetes Engine 2 (RKE2). RKE2 is a CNCF-certified Kubernetes distribution that that focuses on security and compliance within the U.S. Federal Government sector. It solves the common frustration of installation complexity with Kubernetes by removing most host dependencies and presenting a stable path for deployment, upgrades, and rollbacks. Before we start, make sure you have the following pre-flight checklist ready.
Please refer to the KubernetesCluster.pdf file for pre-flight checklist.
-
Login to the Login node via SSH.
-
Generate a ssh key by using the
ssh-keygen
command.ssh-keygen -t ecdsa
You don't have to set a passphrase for the private key for this purpose.
-
Copy the public key to each Kubernetes node. In this tutorial, we assume that a service account/user
username
exists on each node.ssh-copy-id -i ~/.ssh/id_ecdsa.pub username@<kubernetes nodes ip address>
-
Try to login to each Kubernetes node by using SSH. If you are able to login to each node without having to key-in the password, then you have successfully set up passwordless login to the Kubernetes nodes.
The steps below are meant for each Kubernetes node. You need to perform these steps as user root
or as a user with sudo
permission. If you choose the latter, you need to add sudo
at the start of each command.
Create the following file, save it as rke2-canal.conf
and place it in /etc/NetworkManager/conf.d
:
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:flannel*
In Ubuntu, you must disable firewall using the command below, and then reboot the node to restore connectivity:
systemctl stop ufw
systemctl disable ufw
We need to disable swap since Kubelet does not support swap yet.
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
swapoff -a
- Load
br_netfilter
module
modprobe br_netfilter
- Create
/etc/sysctl.d/kubernetes.conf
file and insert the following values:
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
Finally, restart the node.
reboot
Run the following command to install and enable Docker Engine on the login node as user root
:
curl https://releases.rancher.com/install-docker/20.10.sh | sh
usermod -aG docker <username>
systemctl enable --now docker
To set docker0
subnet ip so that it will not conflict with existing UMT Wifi (172.17.*)
do the following guide:
Edit or create config file for docker daemon:
nano /etc/docker/daemon.json
Add lines:
{
"default-address-pools":
[
{"base":"10.10.0.0/16","size":24}
]
}
Restart dockerd:
Check the result:
$ docker network create foo
$ docker network inspect foo | grep Subnet
"Subnet": "10.10.1.0/24"
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
To install git
on the login node:
apt install -y git
The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can use kubectl
to deploy applications, inspect and manage cluster resources, and view logs. For more information including a complete list of kubectl operations, see the kubectl
reference documentation.
To install kubectl
on the login node:
curl -Lo kubectl "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
mv kubectl /usr/local/bin
kubectx
is a tool to switch between contexts (clusters) on kubectl faster while kubens
is a tool to switch between Kubernetes namespaces (and configure them for kubectl) easily.
To install kubectx
and kubens
on the login node:
git clone https://github.com/ahmetb/kubectx /opt/kubectx
ln -s /opt/kubectx/kubectx /usr/local/bin/kubectx
ln -s /opt/kubectx/kubens /usr/local/bin/kubens
K9s is a terminal-based UI to interact with your Kubernetes clusters. K9s continually watches Kubernetes for changes and offers subsequent commands to interact with your observed resources.
To install k9s
on the login node:
curl -Lo k9s_Linux_x86_64.tar.gz "https://github.com/derailed/k9s/releases/download/v0.27.4/k9s_Linux_amd64.tar.gz"
tar -C /usr/local/bin -zxf k9s_Linux_x86_64.tar.gz k9s
Helm helps you manage Kubernetes applications. Helm manages Kubernetes applications by helping you define, install, and upgrade even the most complex Kubernetes application.
To install helm
on the login node:
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
RKE2 provides an installation script that is a convenient way to install it as a service on systemd based systems. This script is available at https://get.rke2.io. To install RKE2 using this method do the following as user root
at the first node:
curl -sfL https://get.rke2.io -o install.sh
chmod +x install.sh
INSTALL_RKE2_CHANNEL=stable;INSTALL_RKE2_TYPE="server" ./install.sh
Create /etc/rancher/rke2/config.yaml
file, and insert the following values:
tls-san:
- <node 1 fqdn>
- <node 2 fqdn>
- <node 3 fqdn>
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
disable: rke2-ingress-nginx
write-kubeconfig-mode: 644
cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16
If you would like to enable IPv6 natively on your Kubernetes cluster, you can use the following configuration instead:
tls-san:
- <node 1 fqdn>
- <node 2 fqdn>
- <node 3 fqdn>
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
disable: rke2-ingress-nginx
write-kubeconfig-mode: 644
cluster-cidr: 10.42.0.0/16,fd42:1:1::/56
service-cidr: 10.43.0.0/16,fd43:1:1::/112
systemctl enable rke2-server.service
systemctl start rke2-server.service
cat /var/lib/rancher/rke2/server/node-token
Take note of the pre-shared secret key. You will need it to join the other nodes to the cluster.
systemctl restart rke2-server.service
Login to node 2 and node 3, and do the following steps:
curl -sfL https://get.rke2.io -o install.sh
chmod +x install.sh
INSTALL_RKE2_CHANNEL=stable;INSTALL_RKE2_TYPE="server" ./install.sh
Create /etc/rancher/rke2/config.yaml
file, and insert the following values:
server: https://<node 1 fqdn>:9345
token: <token>
write-kubeconfig-mode: "0644"
tls-san:
- <node 1 fqdn>
- <node 2 fqdn>
- <node 3 fqdn>
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
disable: rke2-ingress-nginx
- Replace
<node 1 fqdn>
,<node 2 fqdn>
, and<node 3 fqdn>
with the fully qualified domain names of node 1, 2, and 3. - Replace
<token>
with the pre-shared secret key you copied in the previous step.
systemctl enable rke2-server.service
systemctl start rke2-server.service
Generally, the procedure to setup a worker node is the same as the procedure to setup a server node. The only difference is that you need to set the INSTALL_RKE2_TYPE
flag to agent
when you run the install.sh
script.
Below is the complete procedure to setup a worker node (repeat these steps for node 4, 5, and 6):
Login to the worker node by using SSH.
curl -sfL https://get.rke2.io -o install.sh
chmod +x install.sh
INSTALL_RKE2_CHANNEL=stable;INSTALL_RKE2_TYPE="agent" ./install.sh
Create /etc/rancher/rke2/config.yaml
file, and insert the following values:
server: https://<node 1 fqdn>:9345
token: <token>
Replace <node 1 fqdn>
with the fully qualified domain name of node 1. Replace <token>
with the pre-shared secret key you copied in the previous step.
systemctl enable rke2-agent.service
systemctl start rke2-agent.service
To ensure that your cluster of nodes is running and your host machine can connect to your cluster, do the following steps:
At the login node, create a ~/.kube
folder:
mkdir ~/.kube
From the first node, copy the kubeconfig
file to the login node:
scp /etc/rancher/rke2/rke2.yaml username@<login node fqdn>:~/.kube/config
Replace <login node fqdn>
with the fully qualified domain name of the login node. You need to perform this command as user root
.
At the login node, edit the ~/.kube/config
file, and replace server: https://127.0.0.1:6443
with server: https://<node 1 fqdn>:6443
.
Change the file permission:
chmod 400 .kube/config
At the login node, run the following commands:
kubectl get nodes
You should see the following output:
NAME STATUS ROLES AGE VERSION
<node 1 fqdn> Ready control-plane,etcd,master 30m v1.24.10+rke2r1
<node 2 fqdn> Ready control-plane,etcd,master 30m v1.24.10+rke2r1
<node 3 fqdn> Ready control-plane,etcd,master 30m v1.24.10+rke2r1
<node 4 fqdn> Ready <none> 5m v1.24.10+rke2r1
<node 5 fqdn> Ready <none> 5m v1.24.10+rke2r1
<node 6 fqdn> Ready <none> 5m v1.24.10+rke2r1
If you notice, the ROLES for node 4,5, and 6 are <none>
. We need to label them as worker nodes. To do that, run the following commands:
kubectl label node <node 4 fqdn> node-role.kubernetes.io/worker=worker
kubectl label node <node 5 fqdn> node-role.kubernetes.io/worker=worker
kubectl label node <node 6 fqdn> node-role.kubernetes.io/worker=worker
Run the kubectl get nodes
command again, and you should see the following output:
NAME STATUS ROLES AGE VERSION
<node 1 fqdn> Ready control-plane,etcd,master 31m v1.24.10+rke2r1
<node 2 fqdn> Ready control-plane,etcd,master 31m v1.24.10+rke2r1
<node 3 fqdn> Ready control-plane,etcd,master 31m v1.24.10+rke2r1
<node 4 fqdn> Ready worker 6m v1.24.10+rke2r1
<node 5 fqdn> Ready worker 6m v1.24.10+rke2r1
<node 6 fqdn> Ready worker 6m v1.24.10+rke2r1
If you see the above output, your kubernetes cluster is ready to use.
Longhorn is a lightweight, reliable, and powerful distributed block storage system for Kubernetes. Longhorn implements distributed block storage using containers and microservices. The storage controller and replicas themselves are orchestrated using Kubernetes. We will use Longhorn for storing persistent storage objects.
For each worker node:
- Login to the node by using SSH
- Create
/var/lib/longhorn
folder
mkdir /var/lib/longhorn
- Format the dedicated disk for data storage and mount it at the
/var/lib/longhorn
folder
mkfs.ext4 /dev/sdb
mount /dev/sdb /var/lib/longhorn
In the above example, it is assumed that the drive letter for the dedicated disk for data storage is /dev/sdb
. You can use the fdisk -l
command to find the actual drive letter for your system. To make the operating system automatically mount the disk upon booting, you can add the following entry at the last line of the /etc/fstab
file:
/dev/sdb /var/lib/longhorn ext4 defaults 0 0
From the Login node:
- Install open-iscsi and NFSv4 client
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.1/deploy/prerequisite/longhorn-iscsi-installation.yaml
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.1/deploy/prerequisite/longhorn-nfs-installation.yaml
- After the deployment, run the following command to check pods’ status of the installer:
kubectl get pod | grep longhorn-iscsi-installation
kubectl get pod | grep longhorn-nfs-installation
The output should be similar to the following:
longhorn-iscsi-installation-7k9m6 1/1 Running 0 47s
longhorn-iscsi-installation-dfhxc 1/1 Running 0 47s
longhorn-iscsi-installation-xqvdp 1/1 Running 0 47s
longhorn-nfs-installation-2l9sp 1/1 Running 0 111s
longhorn-nfs-installation-n2zp8 1/1 Running 0 111s
longhorn-nfs-installation-sgwd5 1/1 Running 0 111s
And also can check the log with the following command to see the installation result:
kubectl logs longhorn-iscsi-installation-7k9m6 -c iscsi-installation
kubectl logs longhorn-nfs-installation-2l9sp -c nfs-installation
The output should be similar to the following:
Installed:
iscsi-initiator-utils-6.2.1.4-4.git095f59c.el8.x86_64
iscsi-initiator-utils-iscsiuio-6.2.1.4-4.git095f59c.el8.x86_64
isns-utils-libs-0.99-1.el8.x86_64
iscsi install successfully
Installed:
gssproxy-0.8.0-20.el8.x86_64 keyutils-1.5.10-9.el8.x86_64
libverto-libevent-0.3.0-5.el8.x86_64 nfs-utils-1:2.3.3-51.el8.x86_64
python3-pyyaml-3.12-12.el8.x86_64 quota-1:4.04-14.el8.x86_64
quota-nls-1:4.04-14.el8.noarch rpcbind-1.2.5-8.el8.x86_64
nfs install successfully
- Run the following command to ensure that the nodes have all the necessary to install longhorn:
curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.4.1/scripts/environment_check.sh | bash
The output should be similar to the following:
[INFO] Required dependencies are installed.
[INFO] Waiting for longhorn-environment-check pods to become ready (0/3)...
[INFO] All longhorn-environment-check pods are ready (3/3).
[INFO] Required packages are installed.
[INFO] Cleaning up longhorn-environment-check pods...
[INFO] Cleanup completed.
Note: jq
maybe required to be installed locally prior to running env check script. To install jq
on UBuntu, run the following command:
apt install -y jq
On the login node:
- Install Longhorn on the Kubernetes cluster using this command:
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.1/deploy/longhorn.yaml
You can use the k9s
tool or kubectl get pods -n longhorn-system
to monitor the status. A successfully deployed Longhorn looks something like this:
NAME READY STATUS RESTARTS AGE
csi-attacher-dcb85d774-d6ct2 1/1 Running 0 7m16s
csi-attacher-dcb85d774-f4qjb 1/1 Running 0 7m16s
csi-attacher-dcb85d774-vpjzv 1/1 Running 0 7m16s
csi-provisioner-5d8dd96b57-2ww7k 1/1 Running 0 7m16s
csi-provisioner-5d8dd96b57-g58ts 1/1 Running 0 7m16s
csi-provisioner-5d8dd96b57-x49gc 1/1 Running 0 7m16s
csi-resizer-7c5bb5fd65-hvt59 1/1 Running 0 7m16s
csi-resizer-7c5bb5fd65-stmtd 1/1 Running 0 7m16s
csi-resizer-7c5bb5fd65-tj7tj 1/1 Running 0 7m16s
csi-snapshotter-5586bc7c79-869zp 1/1 Running 0 7m15s
csi-snapshotter-5586bc7c79-rqxpp 1/1 Running 0 7m15s
csi-snapshotter-5586bc7c79-zdxs5 1/1 Running 0 7m15s
engine-image-ei-766a591b-9nkw4 1/1 Running 0 7m24s
engine-image-ei-766a591b-b2f24 1/1 Running 0 7m24s
engine-image-ei-766a591b-xkn98 1/1 Running 0 7m24s
instance-manager-e-8d454591 1/1 Running 0 7m23s
instance-manager-e-d894e807 1/1 Running 0 7m24s
instance-manager-e-e5aa709b 1/1 Running 0 7m24s
instance-manager-r-0c0861f9 1/1 Running 0 7m23s
instance-manager-r-d2d51044 1/1 Running 0 7m24s
instance-manager-r-f6b6d7d8 1/1 Running 0 7m24s
longhorn-admission-webhook-858d86b96b-bmfr8 1/1 Running 0 7m54s
longhorn-admission-webhook-858d86b96b-c8hvh 1/1 Running 0 7m54s
longhorn-conversion-webhook-576b5c45c7-4gbrz 1/1 Running 0 7m54s
longhorn-conversion-webhook-576b5c45c7-sz4xj 1/1 Running 0 7m54s
longhorn-csi-plugin-dh475 2/2 Running 0 7m15s
longhorn-csi-plugin-dpljd 2/2 Running 0 7m15s
longhorn-csi-plugin-j9rzf 2/2 Running 0 7m15s
longhorn-driver-deployer-6687fb8b45-vhqhs 1/1 Running 0 7m54s
longhorn-manager-4ntvh 1/1 Running 0 7m54s
longhorn-manager-ln4gs 1/1 Running 0 7m54s
longhorn-manager-lttlz 1/1 Running 0 7m54s
longhorn-ui-86b56b95c8-xxmc7 1/1 Running 0 7m54s
- Once the installation is complete, you can check whether the Longhorn storage class was successfully created by using the command below:
kubectl get sc
The output should be something like this:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
longhorn (default) driver.longhorn.io Delete Immediate true 49d
We are going to use MetalLB as the Load-Balancer for our Kubernetes cluster and configure the nginx ingress to take the IP address that connects the external network with the pods.
On the login node:
-
Run the following command to install MetalLB:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.9/config/manifests/metallb-native.yaml
You can use the
k9s
tool orkubectl get pods -n metallb-system
to monitor the status. A successfully installed MetalLB looks something like this:
NAME READY STATUS RESTARTS AGE
controller-6d5cb87f6-p2rp6 1/1 Running 0 63m
speaker-kz2l6 1/1 Running 0 63m
speaker-pqxh6 1/1 Running 0 63m
speaker-txd75 1/1 Running 0 63m
-
Create
IPAddressPool
andL2Advertisement
objects by creating a Kubernetes manifest file. To do so, createmetallb-configuration.yaml
file and insert the following manifest:apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: rke-ip-pool namespace: metallb-system spec: addresses: - 192.168.1.240-192.168.1.241 - 2001:db8:1::1-2001:db8:1::2 --- apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: rke-ip-pool-l2-advertisement namespace: metallb-system spec: ipAddressPools: - rke-ip-pool
You shall replace the IPv4 address range 192.168.1.240-192.168.1.241
with your dedicated private IP as mentioned earlier.
Optionally, you can also replace the IPv6 address range 2001:db8:1::1-2001:db8:1::2
with your IPv6 address range. If you do not have an public IPv6 address range, you can remove the IPv6 address range from the manifest.
-
Apply the newly created manifest
metallb-configuration.yaml
:kubectl apply -f metallb-configuration.yaml
On the login node:
-
Run the following command to install NGINX Ingress:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.7.1/deploy/static/provider/baremetal/deploy.yaml
You can use the
k9s
tool orkubectl get pods -n ingress-nginx
to monitor the status. A successfully installed NGINX Ingress looks something like this:NAME READY STATUS RESTARTS AGE ingress-nginx-admission-create-4cvm5 0/1 Completed 0 32s ingress-nginx-admission-patch-c5zbb 0/1 Completed 1 32s ingress-nginx-controller-b7b55cccc-mcsjg 0/1 Running 0 32s
-
We need to reconfigure NGINX Ingress to use
LoadBalancer
as the ServiceTypes. To do so, run the following command:kubectl edit svc ingress-nginx-controller -n ingress-nginx
Under the
spec
, findtype
parameter and change its value fromNodePort
toLoadBalancer
. If you would like to enable IPv6, findipFamilyPolicy
parameter and change its value fromSingleStack
toRequireDualStack
, and findipFamilies
parameter and addIPv6
under the list. Please refer to this Kubernetes Service Networking page for more information about IPv6 setting for Service in Kubernetes.After that you can save the manifest and check whether the MetalLB has assigned an IP address from the
rke-ip-pool
by using the following command:kubectl get svc ingress-nginx-controller -n ingress-nginx
You should have output something like this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller LoadBalancer 10.43.219.103 192.168.1.240,2001:db8:1::1 80:30825/TCP,443:31719/TCP 19m
We are going to use Cert-Manager to manage X.509 certificates, particularly to obtain certificates from Let's Encrypt, for our services. cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes workloads. It will obtain certificates from a variety of Issuers, both popular public Issuers as well as private Issuers, and ensure the certificates are valid and up-to-date, and will attempt to renew certificates at a configured time before expiry.
Below are the steps to install Cert-Manager and use it to obtain a certificate from Let's Encrypt:
-
Add Cert-Manager Helm repository:
helm repo add jetstack https://charts.jetstack.io
-
Update your local Helm chart repository cache:
helm repo update
-
Install Cert-Manager:
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.11.0 --set installCRDs=true
-
Check Installation:
kubectl get pods -n cert-manager
The output should be something like this:
NAME READY STATUS RESTARTS AGE cert-manager-877fd747c-rffcd 1/1 Running 0 24s cert-manager-cainjector-bbdb88874-788wh 1/1 Running 0 24s cert-manager-webhook-5774d5d8f7-vc8jr 1/1 Running 0 24s
-
Run the following command to add an additional
dnsConfig
options at thecert-manager
deployment:kubectl edit deployment cert-manager -n cert-manager
Add the following lines under the
dnsPolicy: ClusterFirst
line:dnsPolicy: ClusterFirst dnsConfig: options: - name: ndots value: "1"
-
Create an ACME HTTP Validator manifest file (e.g.
letsencrypt-http-validation.yaml
):
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-http-staging
namespace: cert-manager
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: user@example.com
# Set ISRG X1 as the preferred chain
preferredChain: ISRG Root X1
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-http-staging
# Enable the HTTP-01 challenge provider
solvers:
# An empty 'selector' means that this solver matches all domains
- selector: {}
http01:
ingress:
class: nginx
serviceType: ClusterIP
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-http-prod
namespace: default
spec:
acme:
# The ACME server URL
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: user@example.com
# Set ISRG X1 as the preferred chain
preferredChain: ISRG Root X1
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-http-prod
# Enable the HTTP-01 challenge provider
solvers:
# An empty 'selector' means that this solver matches all domains
- selector: {}
http01:
ingress:
class: nginx
serviceType: ClusterIP
Replace user@example.com
with your email address. Apply this manifest file by using the following command:
kubectl apply -f letsencrypt-http-validation.yaml -n cert-manager
You can check the result by using the following command:
kubectl get clusterissuer
The output looks something like this:
NAME READY AGE
letsencrypt-http-prod True 7s
letsencrypt-http-staging True 7s