elsasserlab/minute

Store intermediate files on $TMPDIR (/scratch)

marcelm opened this issue · 3 comments

It is UPPMAX recommendation and also in our best interest to store intermediate files on the local node (in $TMPDIR).

However, when we let Snakemake submit jobs to SLURM, the files need to be available on the shared filesystem anyway for the next step, so it may not help that much.

cf #70

I think it can help more in some cases than others, but I don't think it will do a lot of harm otherwise.

The demultiplexing step seems to be having quite some trouble, maybe because it constantly updates a set of files all at once and these files can be quite big, this may cause an accumulation of delay (this maybe does not make sense, it depends really how cutadapt implements this, I make my guess based on how I see the sizes of the files updating).

Maybe doing the demultplexing in $TMPDIR + moving the files to the shared filesystem in the end will be cheaper in time than it is now.

I have to admit I cannot understand how the demultiplexing step would actually “hammer” the file system in this way. When I test this on a Rackham node, Cutadapt writes on average around 1 MB/s compressed data to the file system – which is really not a lot.

What I did notice that the processing speed varies a bit when I write the output data to the network file system. There’ll be short “dips” in the rate of reads written. I don’t see those when I write to $TMPDIR (/scratch), but in the end the difference in total runtime is still just something like 3%.

Is it possible that you did your tests while the cluster had filesystem problems?

This I am also closing because it is again Uppmax-specific and it works well now.