Vagrant-based system to spawn a ready to use multi-node Kubernetes/Slurm VMs cluster. Usable for testing purposes, learning, and development.
The only prerequisites are basic virtualization tools, in particular with Vagrant
and it’s dependencies. The virtualization is done using `libvirt` and `qemu-kvm` on Linux systems, but it should be possible (not tested) to use other providers like VirtualBox or VMware.
On RHEL-based system you can install them with:
sudo dnf install -y $(sed -r '/^#/d' requirements.txt)
and enable the virtualization services:
sudo systemctl enable --now libvirtd
sudo usermod -aG libvirt $(whoami)
vagrant plugin install vagrant-libvirt
Even if it’s not strictly necessary, I highly recommend to install vagrant-scp
plugin to easily copy files to/from the VMs.
vagrant plugin install vagrant-scp
The VMs are defined in the Vagrantfile and can be customized to your needs.
- By default the cluster are composed by 1 master (
kube-00
) and 2 workers nodes. - The VMs are based on Fedora cloud images
k8s
is used as Kuberentes cluster manager- The default user is
vagrant
with passwordvagrant
- In the home of
vagrant
user, there is mounted as shared folder among nodes the `shared` directory. - Feel free to changes number of nodes and resources, but be aware that:
- To run Kubernetes the minimum requirements are 2GB of RAM and 2 CPUs
- You need to manually adjust the files slurm_install.sh and slurm_worker.sh to reflect the number of nodes and their names. This should be done both at the begin of the file, where it defines int
/etc/hosts
the names of the nodes, and at the end, where it creates theslurm.conf
file.
To start the VMs run:
sudo virsh net-define scripts/kube-net.xml
sudo virsh net-start kube-net
vagrant up --provider=libvirt --no-parallel
Once the VMs are up and running, you can access them with:
vagrant ssh kube-00
Note that `vagrant` take cares of the SSH keys and the IP addresses of the VMs, however, if you need to access the VMs directly (e.g. to use the ssh extension for VScode) , you can use the ssh
keys available in the ssh directory.
You can control the cluster using the kubectl
command. To do that you don’t need to ssh into the master node every time, but you can use the kubeconfig
file available in the vagrant
user home directory. To use it, you can copy it to your local machine with:
vagrant scp kube-00:/home/vagrant/.kube/config ./playground_kubeconfig.yaml
and then set the KUBECONFIG
environment variable to point to the file:
export KUBECONFIG=$(pwd)/playground_kubeconfig.yaml
By default no Container Network Interface (CNI) is installed. You can choose between flannel
and calico
by running (ssh into the master node):
/home/vagrant/deploy_calico.sh
or
/home/vagrant/deploy_flannel.sh
this will launch the already pre-configured scripts deploy_calico.sh and deploy_flannel.sh.
**Important**: once the CNI is installed - you can check it using kubectl
or k9s
- you must restart all the nodes to apply the changes. To do that just run:
vagrant reload
To switch between CNIs or to uninstall it, you can run:
helm uninstall flannel --namespace kube-flannel
or
kubectl delete -f /home/vagrant/calico.yaml
and then restart the nodes.
The cluster is also configured to run Slurm, a job scheduler and resource manager for HPC systems. All the nodes are configured as a debug
partition.
**Remark**: At this moment, due to some issues, the Slurm is working only for the root
users. Enabling it for non-root users is a future TODO. Since this environment should be used for testing and learning purposes, this limitation should not be a big deal.
- [ ] Optimize the automatic deployment using
Ansible
andkubespray
- [ ] Enable Slurm for non-root users
- [ ] Add more CNIs (e.g.
cilium
)