OpenHPC: Beyond the Install Guide

Materials for "OpenHPC: Beyond the Install Guide" half-day tutorial. This is the pearc24 branch for PEARC24.

Infrastructure preparation is largely adapted/copy-pasted from Tim Middelkoop's ohpc-jetstream2 repo, plus Jetstream's CLI Overview and following sections.

The goal of this repository is to let instructors or self-learners to construct one or more OpenHPC 3.x virtual environments, and for those environments to be as close as possible to the defaults from the OpenHPC installation guide.

These environments will be using Rocky 9 x86_64, Warewulf 3, and Slurm.

Prerequisites to set this up yourself

A copy of this repository.
Vagrant on x86-64 with one of the following desktop hypervisors: VirtualBox, VMware desktop hypervisors, or Parallels. Might also work with Apple Silicon systems with VMware desktop hypervisors, but this is currently untested.
OpenStack CLI and API access (only tested with Jetstream2).
An OpenStack RC file for your OpenStack (e.g., generated from Setting up application credentials and openrc.sh for the Jetstream2 CLI).

Repository structure

README.md is what you're reading now.
Vagrantfile provides settings for a consistent environment used to create the Jetstream2 infrastructure for the workshop.
reference contains an unmodified copy of OpenHPC's recipe.sh and input.local files from Appendix A of OpenHPC (v3.1) Cluster Building Recipes, Rocky 9.3 Base OS, Warewulf/SLURM Edition for Linux (x86 64).
repos contains third-party yum repositories for the Vagrant VM (currently only for opentofu).
openstack-tofu contains Terraform/OpenTofu configuration files to build the HPC cluster structure for the instructor and the students, plus shell scripts to exchange data between the configuration output and Ansible.
ansible contains Ansible playbooks, inventories, and host variables used to complete configuration of the HPC cluster installation for the instructor and the students.
.vagrant will show up after you build the Vagrant VM. Its contents are all ignored.
.gitignore controls which files are ignored by git. Probably no reason to modify it.

Setting up the workshop

OpenRC file

Copy the OpenRC file you got from Jetstream into the openstack-tofu folder. Make sure the OpenRC file is named to match the wildcard app-cred-*-openrc.sh. You should only have one OpenRC file in this folder.

Container Setup (alternative to using Vagrant VM)

Build the container and start it with the following commands (or similar based on your environment)

docker build -t btig-tools --file Containerfile .
docker run -it --rm btig-tools

Vagrant VM

Run vagrant up from the top-level folder for this repository; this should create a Rocky 9 VM. The VM will install opentofu, the Python OpenStack clients, Ansible, xkcdpass, mtools to build an iPXE disk image, jq to process JSON data, and all their dependencies.

If no file named disk.img exists in the openstack-tofu folder, the VM will create one. Then, the VM will also copy the OpenRC file from the openstack-tofu folder into a startup folder for the vagrant user. Also, the VM will ensure that the vagrant user does not inherit any SSH_AUTH_SOCK variables from the outside environment, as this can interfere with SSH connections to the management nodes. Finally, VM will create an ssh key for the vagrant user and include its public contents in the file ssh_key.tf in the openstack-tofu folder.

Once the VM is finished with these steps, you can log into it with vagrant ssh and manage things from there. Test that you can access OpenStack by running openstack flavor list and see a list of OpenStack instance types.

OpenTofu initialization

First, cd /vagrant/openstack-tofu and run ./init.sh. This should initialize the project directory, and if no compute image named efi-ipxe exists in your OpenStack project, it will create a new compute image type with an e1000 network card using disk.img.

OpenTofu settings

Next, create a file local.tf in the /vagrant/openstack-tofu folder. It should contain the following variables:

variable "outside_ip_range" {
    type = string
    default = "0.0.0.0/0"
}

variable "openstack_public_network_id" {
    type = string
    default = "3fe22c05-6206-4db2-9a13-44f04b6796e6"
    # no need to change this for any Jetstream2 allocation, looks like.
}

variable "n_students" {
    type = number
    default = 0
}

variable "nodes_per_cluster" {
    type = number
    default = 1
}

outside_ip_range defines which IPs are allowed ssh and ping access to the HPC management nodes.
openstack_public_network_id contains the ID of the "public" network at the edge of your OpenStack environment. On Jetstream2, it can be found by clicking the "public" name at the Project / Network / Networks entry for your project allocation. As I had the same network ID on two different projects on Jetstream2, this may be a constant value for everyone.
n_students defines how many student clusters to set up (not including the cluster always set up for the instructors).
nodes_per_cluster defines how many compute nodes to set up for each cluster.

You may need to increase your project allocation if adding (n_students+1)*nodes_per_cluster compute instances would exceed your compute instance limit.

OpenTofu resource creation

Next, run ./create.sh in the /vagrant/openstack-tofu folder. This script will create:

A router defining the boundary separating the OpenHPC-related resources and the outside world.
An external network, subnet, and security group connecting all OpenHPC management nodes to the router.
n_students+1 OpenHPC management nodes named sms-0 through sms-N, running Rocky 9 with 2 cores, 6 GB RAM, 20 GB disk space, and a public IPv4 address.
n_students+1 separate internal networks and subnets to connect compute nodes to the OpenHPC management nodes. These have little to no network security enabled, similar to a purely internal HPC network.
(n_students+1)*(nodes_per_cluster) OpenHPC compute nodes named clusterM-nodeN, each connected to the correct internal network.
n_students+1 host entries in ~vagrant/.ssh/config, which enables ssh username@sms-N to automatically connect to the correct manaegment node.

Additionally, the create.sh script will also:

Retrieve the public IPv4 addresses for each OpenHPC management node.
Remove any ssh host keys for those addresses stored in ~vagrant/.ssh/known_hosts.
Populate Ansible host_vars files with compute node names and MAC addreses, usernames, and passwords for each cluster. You can adjust the number of user accounts created by changing the USERS_PER_HOST value in create.sh.
Wait for every OpenHPC management node to respond to ssh connections.
Print the public IPv4 addresses for each OpenHPC management node.

Ansible

To finish configuring the clusters, cd /vagrant/ansible. The Ansible folder contains 7 main playbooks generally corresponding to sections of OpenHPC (v3.1) Cluster Building Recipes, Rocky 9.3 Base OS, Warewulf/SLURM Edition for Linux (x86 64):

0-undocumented-prereqs-unrelated-settings.yaml
2-install-base-os.yaml
3-install-openhpc-components.yaml
a0-installation-template.yaml
a1-run-recipe.yaml
a1-run-recipe.yaml
z-post-recipe.yaml

Run these manually one at a time in order, or in a for loop like for p in [023az]*.yaml; do ansible-playbook $p; done.

`0-undocumented-prereqs-unrelated-settings.yaml`

Installs any programs needed for the default OpenHPC configuration files to work, particularly, the s-nail program for MailProg=/bin/mail in /etc/slurm/slurm.conf.

Also configures a few more settings applicable to a multi-student workshop environment, but not necessarily required in every case:

Adding the user1, ..., userN accounts, with membership in group wheel.
Ensuring members of group wheel have password-less sudo access.
Enabling password authentication for ssh connections and reloading the sshd service.

`2-install-base-os.yaml`

As the default Rocky 9 Jetstream images have a workable base operating system installed, completing section 2 of the installation guide only requires:

Adding the management system's hostname and internal IP to /etc/hosts.
Disabling SELinux.
Setting the timezone to America/New_York.

Though stopping and disabling firewalld is part of section 2, it's already handled in the recipe.sh script from Appendix A, so we omit it here.

`3-install-openhpc-components.yaml`

Since we'll be using the recipe.sh script to perform the installation, we skip over most of the steps in section 3. Instead we:

Install the OpenHPC repository release file.
Enable the CodeReady Builder repository if needed.
Install the docs-ohpc package to get a copy of recipe.sh and input.local.

`a0-installation-template.yaml`

This playbook makes copies of recipe.sh and input.local, and modifies them to either match the virtual environment, or to ensure more things are running correctly when the students first connect. This includes:

`input.local`

Setting provision_wait=1 to spend less time waiting for remote node power-cycling with IPMI, since that's not supported in Jetstream2.
Setting num_computes to the number of compute nodes in each cluster.
Replacing MAC addresses for the compute nodes.
Changing the slurm_node_config and update_slurm_nodeconfig variables to ensure slurm.conf has the correct values for the OpenStack instance type.
Setting sms_name to the correct hostname of the management node (i.e., sms-N).

`recipe.sh`

Changing the CHROOT path from /opt/ohpc/admin/images/rocky9.3 to /opt/ohpc/admin/images/rocky9.4.
Ensuring that the slurmd and munge services are enabled in the chroot.
Removing unneeded pdsh commands.
Replacing echo commands with idempotent ansible.builtin.lineinfile tasks for both /etc/exports and /etc/chrony.conf.

`a1-run-recipe.yaml`

This playbook simply runs the copy of recipe.sh with the environment variable OHPC_INPUT_LOCAL pointing to the modified copy of input.local. This will probably take around 10 minutes to run, and multiple management nodes can run the script simultaneously.

`z-post-recipe.yaml`

This playbook fixes a few things that can only be done after recipe.sh has run:

Setting the timezone in the chroot to America/New_York and rebuilding the chroot.
Removing duplicate ReturnToService lines from /etc/slurm/slurm.conf (will be unnecessary after an OpenHPC release including PR 1994) is announced).
Creating /var/log/slurmctld.log with correct ownership and permissions.
Storing host ssh keys from the compute nodes in the management node's /etc/ssh/ssh_known_hosts to eliminate warnings on first ssh connections to the compute nodes.
Rebooting the compute nodes to apply the updated system image from item 1.

Testing the workshop environment

Initial testing from the Vagrant VM:

Run ssh rocky@sms-0 to log into the instructor managment node.
On the management node, run sinfo until you see node c1 in an idle state in the default normal partition.
On the management node, run ssh c1 to get an expected error message of Access denied: user rocky (uid=1000) has no active jobs on this node.
On the management node, run srun -n2 hostname to return two lines of c1, indicating that you ran a 2 task-per-node job of the hostname command.

By default, each OpenHPC management node has three user accounts defined:

rocky, which allows logins from the vagrant account's ssh key.
user1 and user2, which have the same password, and which allow logins from any user with the password.

The passwords for each cluster's user1 and user2 account are stored in the /vagrant/ansible/user-passwords.txt file. The passwords for cluster N can be found on line N+1 of that file (i.e., if the vagrant user runs ssh user2@sms-4, the password will be on line 5 of the user-passwords.txt file.) All other users can ssh to the OpenHPC management nodes by public IP address. Each management node's public IP address can be found in the /vagrant/ansible/local.ini file.

Thus, the sinfo, ssh, and srun commands above should also work outside the Vagrant VM with the user1 through userN accounts and passwords, as long as you SSH to the management node's public IP address.

mikerenfro/openhpc-beyond-the-install-guide