This repo is now archived, and the materials have been moved here.
- Login to gateway
ssh USERNAME@ssh.swc.ucl.ac.uk
- Login to HPC
ssh hpc-gw1
N.B. You can set up SSH keys so you don't have to keep typing in your password.
- See what nodes are available using
sinfo
.
N.B. Adding this line to your .bashrc
and typing sinfol
will print much more information.
alias sinfol='sinfo --Node --format="%.14N %.4D %5P %.6T %.4c %.10C %4O %8e %.8z %.6m %.8d %.6w %.5f %.6E %.30G"'
-
See what jobs are running with
squeue
. -
See what data storage is available:
cd /ceph/store
- general lab data storagecd /nfs/winstor
- older lab data/ceph/neuroinformatics/neuroinformatics
- team storagecd /ceph/zoo
- specific project storagecd /ceph/scratch
- short term storage, e.g. for intermediate analysis results, not backed upcd /ceph/scratch/neuroinformatics-dropoff
- "dropbox" for others to share data with the team
- Start an interactive job in pseudoterminal mode (
--pty
) by requesting a single core from SLURM, the job scheduler:
srun -p cpu --pty bash -i
N.B. Don't run anything intensive on the login nodes.
- Clone a test script
cd ~/
git clone https://github.com/neuroinformatics-unit/slurm-demo
- Check out list of available modules
module avail
- Load the miniconda module
module load miniconda
- Create conda environment
conda env create -f env.yml
- Activate conda environment and run Python script
conda activate slurm_demo
python python multiply.py 5 10 --jazzy
- Stop interactive job
exit
- Check out batch script:
cat batch_example.sh
#!/bin/bash
#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 2G # memory pool for all cores
#SBATCH -n 2 # number of cores
#SBATCH -t 0-0:10 # time (D-HH:MM)
#SBATCH -o slurm_output.out
#SBATCH -e slurm_error.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=adam.tyson@ucl.ac.uk
module load miniconda
conda activate slurm_demo
for i in {1..5}
do
echo "Multiplying $i by 10"
python multiply.py $i 10 --jazzy
done
Run batch job:
sbatch batch_example.sh
- Check out array script:
cat array_example.sh
#!/bin/bash
#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 2G # memory pool for all cores
#SBATCH -n 2 # number of cores
#SBATCH -t 0-0:10 # time (D-HH:MM)
#SBATCH -o slurm_array_%A-%a.out
#SBATCH -e slurm_array_%A-%a.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=adam.tyson@ucl.ac.uk
#SBATCH --array=0-9%4
# Array job runs 10 separate jobs, but not more than four at a time.
# This is flexible and the array ID ($SLURM_ARRAY_TASK_ID) can be used in any way.
module load miniconda
conda activate slurm_demo
echo "Multiplying $SLURM_ARRAY_TASK_ID by 10"
python multiply.py $SLURM_ARRAY_TASK_ID 10 --jazzy
Run array job:
sbatch array_example.sh
- Start an interactive job with one GPU:
srun -p gpu --gres=gpu:1 --pty bash -i
- Load CUDA
module load cuda
- Activate conda environment check GPU
conda activate slurm_demo
python
import tensorflow as tf
tf.config.list_physical_devices('GPU')
- For fast I/O consider copying data to
/tmp
(fast NVME storage) as part of the run. Available on all of the gpu-380 and gpu-sr670 nodes.