long staging

Question

long staging

Opened this issue 2 years ago · 1 comments

Lately i've been submitting hippunfold jobs from within a regularSubmit -j LongSkinny job because building the DAG and staging takes so long.
Possible bottlenecks:

If my understanding is correct, i think separate jobs are being submitted for most (all?) rules but i thought we had planned to submit one job per subject using snakemake's group:"subj" system. I think this must not be working. I wonder if there is some interaction due to different resources or container for each rule?
searching the input BIDS directory for the required files could be taking quite long. i think this relies on snakebids and pybids but i wonder if it could be sped up?
printing every job to terminal is a bit much to look at and might be slowing things down a tad. maybe we could suppress this
maybe i should experiment more with the --group-components flag, and we should recommend this in the readthedocs

Answer 1 · 2022-10-20T19:38:55.000Z

I think we connected off-line about this, but closing the loop here too --

The group: subj indeed will make sure only one cluster job is submitted per subject, but it still needs to do all the accounting for all the rules. The long delay when running on graham is usually related to the slow network file system (/project,/scratch,/home) , especially if running on a large dataset. Snakemake writing to the .snakemake folder can also be slow if on the network file system.

I don't think the printing itself slows things down, and I agree it is a lot of text, but not sure there is an easy way to suppress that (if it's not possible via a snakemake option) without suppressing other necessary information..

The --group-components is mainly useful if you want more than one subject in a cluster job (e.g. --group-components subj=5 will group 5 subjects per job), which is useful when you have e.g. >1000 subjects, since only 1000 jobs can be submitted at a time on graham.. But it won't speed up the submission at all (but might save some time overall if jobs wait in the queue less). Note: group-components is mentioned in the docs, but in the Contributing to HippUnfold -> Instructions for Compute Canada..

That said, the most efficient way to run hippunfold for a large number of subjects is with a wrapper script -- this wrapper was made for this purpose: https://github.com/akhanf/batch-hippunfold

I'm leaving this issue open as a reminder to point to this wrapper in the hippunfold docs.