carpentries-incubator/hpc-intro

Tiny self-hosted cluster for HPC Carpentry workshop?

Opened this issue · 2 comments

Hi folks,

Disclaimer first: I am tossing this idea for now but will have to do the testing later. Background: I was tasked to test out "ColdFront" on my primary duty and the tutorial comes with a set of Docker containers that altogether creates a workable HPC environment (complete with Open OnDemand and XDMoD and ColdFront services!). You can see the environment here: https://github.com/ubccr/hpc-toolset-tutorial/blob/master/docs/getting_started.md . This environment comes with a login node, two compute nodes, and the ancillary containers mentioned above, plus LDAP and database server(s).

It just come to my mind that we could repurpose this "hpc toolset" container set to have standalone "HPC-like" environment on a multicore laptop / desktop / workstation / server that a user has, if he/she just doesn't have an alternative HPC environment. (Sidebar: In my latest "intro to HPC" teaching I found out that some learners have 6-core and 8-core laptops, which was unheard of before. One other participant has an M2 Macbook Air that has 4 efficiency cores and 4 performance cores. The increased num of cores and memory and storage on modern laptops may make it viable to have a tiny HPC env on learner's own hardware with a number of caveats mentioned below (and more that I still haven't thought of as of now). This is rather easy to set up (for fairly capable Linux user like me) but probably more tricky to get right since it is in container---compute nodes containers may be difficult to pin to specirfic set of physical cores (?) -- don't count me too seriously as I am no Docker expert. So in that sense it is difficult to get the real "HPC" experience where compute nodes have dedicated cores to process stuff -- which will lead to skew in timing of computation (think: the parallel Pi calculation). But on the other hand this containerized environment is quite easy for a Linux or Mac or even Windows (nowadays with WSL) to set up---so that individual users can actually have it running on their own (capable) laptops.

Clearly there is some ironing to do in order to make the "HPC in a container" work in a turn-key manner. But it looks promising to me. As I continue working with my original project I may be able to tell if this is indeed a useful ad-hoc solution.

Wirawan

For a similar goal, but a different implementation, the following would let an instructor build an OpenHPC management system and a compute node in VirtualBox and Vagrant. Needs 2 GB of RAM for the management system plus 4 GB per compute node. Would allow learners to ssh into the management VM on port 2222 of the instructor's system.

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|

  # SMS server
  config.vm.define "sms", primary: true do |sms|
    #sms.vm.box = "generic/rocky8"
    sms.vm.box = "bento/rockylinux-8"
    sms.vm.hostname = "sms"
    # sms.vm.synced_folder ".", "/vagrant", disabled: true
    sms.vm.network "private_network", ip: "172.16.0.1", netmask: "255.255.0.0", virtualbox__intnet: "XCBC"
    sms.vm.network "forwarded_port", guest: 22, host: 2222
    sms.vm.provision "shell", inline: <<-SHELL
      YUM="yum -q -y"
      sms_ip="$(nmcli device show eth1 | grep IP4.ADDRESS | awk '{print $NF}' | cut -d/ -f1)"
      sed -ie "\\$s/127.0.1.1/${sms_ip}/" /etc/hosts
      echo "Yum updates"
      ${YUM} update
      echo "OHPC repo"
      ${YUM} install http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm
      ${YUM} install dnf-plugins-core
      ${YUM} config-manager --set-enabled powertools
      echo "OHPC docs install"
      ${YUM} install docs-ohpc perl
      echo "Fix recipe settings"
      perl -pi.bak -e \
        's/c_mac\\[0\\]=00:1a:2b:3c:4f:56/c_mac\\[0\\]=08:00:27:00:00:01/;s/c_mac\\[1\\]=00:1a:2b:3c:4f:56/c_mac\\[1\\]=08:00:27:00:00:02/;s/c_mac\\[2\\]=00:1a:2b:3c:4f:56/c_mac\\[2\\]=08:00:27:00:00:03/;s/c_mac\\[3\\]=00:1a:2b:3c:4f:56/c_mac\\[3\\]=08:00:27:00:00:04/;s/eth_provision:-eth0/eth_provision:-eth1/' \
        /opt/ohpc/pub/doc/recipes/rocky8/input.local
      echo "OHPC recipe.sh"
      /opt/ohpc/pub/doc/recipes/rocky8/x86_64/warewulf/slurm/recipe.sh
      perl -pi.bak -e 's/Sockets=2 CoresPerSocket=8 ThreadsPerCore=2/Sockets=1 CoresPerSocket=1 ThreadsPerCore=1/' /etc/slurm/slurm.conf
      systemctl restart slurmctld
    SHELL
  end

  # Compute servers
  (1..1).each do |compute_idx|
    config.vm.define "c#{compute_idx}", autostart: false do |compute|
      compute.vm.box = "clink15/pxe"
      # compute.vm.hostname = "c#{compute_idx}"
      compute.vm.network "private_network", virtualbox__intnet: "XCBC", mac: "08002700000#{compute_idx}", auto_config: false
      compute.ssh.insert_key = false
      compute.vm.allow_fstab_modification = false
      compute.vm.allow_hosts_modification = false
      compute.vm.boot_timeout = 1
      compute.vm.provider "virtualbox" do |vb|
        vb.customize ["modifyvm", :id, "--nicbootprio2", "1"]
        vb.memory = "4096"
      end
      compute.vm.synced_folder ".", "/vagrant", disabled: true
    end
  end

end

One could just look at what Slurm do for their tutorials: https://gitlab.com/SchedMD/training/docker-scale-out/

That should give you a working Slurm cluster, and with a little bit of effort you could get EESSI on there which would give you a software stack.