Create a vitrualenv and install the tiny requirements
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
To run multiple experiments you have to create an experiments.yaml
file with a list of datasets
and models
see the cfg_templates.py file to know the required fields. One can check this train config for a training example and this val config for a validation one.
Once that is in place, one should run
python -m src.generate_scripts --cfg val_experiments.yaml
Let us assume that in experiments.yaml
one has set the following
...
sbatch_config:
experiments_list_file_path: "exps_list.txt"
sbatch_script_file_path: "sbatch_script.sbatch"
...
this will create two files:
- An experiments list file
exps_list.txt
which holds all the commands needed to run the experiments; - A sbatch script file
sbatch_script.sbatch
, that will start a slurm job array with as many experiments as there are pairs of datasets/models inexperiments.yaml
file.
Now one should be able to start all the jobs with the command:
sbatch sbatch_script.sbatch # autorestart coming soon
NOTE: the logs paths are created when generating the sbatch scripts and not when running the actual experiments. Don't delete them.
One can try using autorestart like this:
python -m src.autorestart_job_array \
"sbatch sbatch_script.sbatch" \
--check-interval-secs 3 \ # change depending on the job
--output-file-template "slurm_logs/slurm-{job_id}_{array_task_id}.out" \
--verbose 1
NOTE: make sure that
--output-file-template <this_path>
matchessbatch_config.output: <this_path>
. To do this one needs to change{job_id} -> %A and {array_task_id} -> %a
so for using--output-file-template slurm-{job_id}_{array_task_id}.login the config one will need
sbatch_config: output: slurm-%A_%a.log
- Add option to ignore experiments
- More testing
- Improve resume