r2dt-bio/R2DT

Temp file collision causing cluster jobs to hang

afg1 opened this issue · 3 comments

afg1 commented

I've come across a bit of an edge case where the naming of the tempfile by tRNAscan-SE is suboptimal, leading to jobs hanging waiting for user input.

I'm running the R2DT container (converted to singularity) within a nextflow pipeline, with the job submission system being LSF.

Having looked at the code in tRNAscan responsible for naming the tempfile, I'm not really sure how this is possible - it would appear that two perl interpreters on a given host end up with the same PID at the same time. I suspect container peculiarities, but I don't know enough to say for sure.

Here's the output I found:

tRNAscan-SE v.2.0.5 (October 2019) - scan sequences for transfer RNAs
Copyright (C) 2019 Patricia Chan and Todd Lowe
                   University of California Santa Cruz
Freely distributed under the GNU General Public License (GPLv3)

------------------------------------------------------------
Sequence file(s) to search:        output/subset.fasta
Search Mode:                       Archaeal
Results written to:                output/gtrnadb/A-subset.txt
Output format:                     Tabular
Searching with:                    Infernal First Pass->Infernal
Isotype-specific model scan:       Yes
Scan for noncanonical introns
Covariance model:                  /usr/local/lib/tRNAscan-SE/models/TRNAinf-arch.cm
                                   /usr/local/lib/tRNAscan-SE/models/TRNAinf-arch-SeC.cm
Infernal first pass cutoff score:  10

Temporary directory:               /hps/scratch/lsf_tmpdir/hl-codon-123-01
------------------------------------------------------------

No tRNAs found.


WARNING: /hps/scratch/lsf_tmpdir/hl-codon-123-01/tscan3451328.fpass exists already.

 (O)verwrite file, (A)ppend to file, or (Q)uit program? 
Reply (O)verwrite (A)ppend, or (Q)uit [O/A/Q]: 
Reply (O)verwrite (A)ppend, or (Q)uit [O/A/Q]: 
Reply (O)verwrite (A)ppend, or (Q)uit [O/A/Q]: 
Reply (O)verwrite (A)ppend, or (Q)uit [O/A/Q]: 

... ad infinitum

Even better, this results in a nextflow log file that eats all available disk and then crashes the controlling process.

My suggestion would be to change sub tempname in tRNAscan-SE-2.0/lib/tRNAscanSE/Utils.pm to do something other than use PID to guarantee uniqueness, maybe using rand() or a time-based method

Alternatively, you could change the call within R2DT to ask tRANscan to overwrite without asking, but if this really is two interpreters with the same PID, that will almost certainly be disastrous!

For now I can workaround this by amending the TMPDIR variable to contain some randomly generated elements

This is an interesting case! @patriciaplchan - would you have any advice?

I can only think of two scenarios that this problem may occur.

  1. Two instances of tRNAscan-SE with the same PID are running on the same host. But this is usually not likely to happen.
  2. The temporary files from a previous run that used the same PID were not deleted at the end of the process. But this also will not happen unless it is caused by some container-specific settings.

I will add this as an issue in our internal tracking system and replace the PID with a UUID in the next release. Hopefully, that can solve the problem.

Great, thank you for looking into this @patriciaplchan!