
Fastq file names containing the string "sample" cause the pipeline to fail

bioruffo opened this issue · 3 comments

I was running JAFFA with a dummy file named just "sample_1.fastq.gz", and the pipeline was failing at the last step, like this:

==================================== Stage compile_all_results =====================================
> options(echo=F)
Compiling the results from:
Done writing output jaffa_results.csv
Done writing jaffa_results.fasta
All Done.
ERROR: Expected output file jaffa_results.fasta could not be found

========================================= Pipeline Failed ==========================================

Expected output file jaffa_results.fasta could not be found

Use 'bpipe errors' to see output from failed commands.

Indeed, the file "jaffa_results.fasta" is never generated if the file name contains the string "sample".
While if I rename the same file to "sampl3_1.fastq.gz", the pipeline succeeds.

Apparently, this is caused by the function get_fusion_seqs() in scripts/get_fusion_seqs.bash. On line 36, it will return with no output if the first token in the line of "jaffa_results.csv" being processed contains the string "sample":

  if [[ ${field1} =~ "sample" ]]

This is most logically to avoid parsing the first (header) line of "jaffa_results.csv", which starts with "sample" as defined by line 40 of compile_results.R. However, with the =~ operator, this will match any line whose first token (file name) contains the string "sample".

Besides the simplest fix of changing the =~ operator to ==, perhaps a stronger solution would be to alter this conditional to check for the second token:

  # NOTE must match the header as defined in compile_results.R
  if [[ ${field2} == "fusion" ]]

The logic behind the proposed change is: while the first token is subject to being matched on accounts of file/sample name, the second token is safer, as it is "fusion" in the header line; while in any subsequent line it is comprised of two fused gene names, and should reasonably never be "fusion".

Hi Roberto,

Thank you very much for not only reporting this issue, but also finding the causing and suggesting a fix! I will add this into the next version of our code.


Fixed in commit 6bcad6a Along with #68 #72