This repository is a minimal working example on how to:
- setup Hydra
- setup batch of slurm jobs on top of Hydra via submitit-launcher
⚠️ You need to installhydra-core
for this step.
Hydra is fairly easy to set-up:
- one .yaml configuration file containing the default config values
- a
@hydra.main
wrapper on your main experiment function to pass the configurations values as argument.
By simply running python slurm_hydra_submitit/script.py
, you'll see
how the main function takes the arguments from the configuration file and pass
them to the following underlying functions.
⚠️ You need to installhydra-submitit-launcher
for this step.
Now that our Hydra conf is setup, we want to run the job on a SLURM cluster instead of our local computer. For that, we need to:
- specify the hydra launcher to work on the SLURM cluster
- specify the hardware specifications for the SLURM job
If you connect to your SLURM cluster scheduler node, just by installing
hydra-submitit-launcher
, you can already launch jobs on the cluster with:
python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm
To test locally before sending to the cluster, you can switch the hydra/launcher
argument to submitit_local
.
You can easily adapt the SLURM parameters by modifying the following arguments SLURM launcher arguments.
For example, the following script is executed on nodes with 10 CPUs:
python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm hydra.launcher.cpus_per_task=10
You can launch multiple jobs at once by specifying their values in the launch command.
For example, the following command launches 4 jobs which corresponds to all the possible combinations of arguments.
python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm project_name=P1,P2 train.epochs=30,40
Alternatively, you can pass sets of parameters to test together:
python slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm +compile="{project_name:P1,train.epochs:30}, {project_name:P2,train.epochs:40}"
To clean this command a bit, we can create a bash script similar to this:
#!/bin/bash
params=(
'{project_name:P1,train.epochs:10},'
'{project_name:P2,train.epochs:20}'
)
slurm_hydra_submitit/script.py --multirun hydra/launcher=submitit_slurm +compile="${params[*]}"