All in one cluster (BeeGFS & SLURM) on CentOS 7.2

Deploys on the same set of VM:

BeeGFS cluster with metadata and storage nodes
Slurm as Job Scheduler

Click here to deploy:

Questions for deployement:

Fill in the mandatory parameters.
Select an existing resource group or enter the name of a new resource group to create.
Select the resource group location.
Accept the terms and agreements.
Click Create.

Architecture

Logical Architecture

The VM called storage0 is :

the BeeGFS metadata server + management host
the slurm master
NFS server: export the following shared storage /share/home & /share/data

The VMs called storage[1-n] are:

BeeGFS storage server
[Optionnal] some of them may also be BeeGFS metadata server (based on the template parameters)
Slurm compute nodes

Deployed in Azure

BeeGFS

The BeeGFS storage is mounted on /share/scratch on every nodes

SLURM

Each compute node by default has 1 core avalaible for slurm

You should change the slurm.conf file to adapt it to the real number of cpu:

NodeName=storage[1-number_of_nodes] Procs=16

Then restart the slurm daemon:

systemctl restart slurmctld

And put the nodes on ine with scontrol:

scontrol: update NodeName=storager0 State=RESUME scontrol: update NodeName=storager1 State=RESUME scontrol: exit

Then control with:

sinfo -N -l

Accessing the cluster

Simply SSH to the master node using the IP address.

# ssh [user]@[public_ip_adress]

You can log into the first metadata node using the admin user and password specified.

Still to do

check that all package intalled during install_pkgs_slurm fonction in deployazure.sh are mandatory
let the user chose how many data disk per VM
use VMSS instead of VM
use Ganglia for monitoring
enble MPI if RDMA instance are used + uses HPC images of CentOS

ruiruitang/beegfs-shared-slurm-on-centos7.2