metagenome-atlas/atlas

Error in rule apply_quality_filter

johnne opened this issue · 1 comments

  • I checked and didn't found a related issue,e.g. while typing the title
  • I got an error in the following rule(s): apply_quality_filter
  • I checked the log files indicated indicated in the error message (and the cluster logs if submitted to a cluster)

Here is the relevant log output:

java -ea -Xmx51G -Xms51G -cp /crex/proj/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/conda_envs/7e5a7fea79fbf3816713ab3b8fc02eb3_/opt/bbmap-39.01-0/current
/ jgi.BBDuk in1=m.c-2/sequence_quality_control/m.c-2_deduplicated_R1.fastq.gz in2=m.c-2/sequence_quality_control/m.c-2_deduplicated_R2.fastq.gz ref=/proj/snic2020-5
-486/nobackup/SMS-23-6668-micegut/resources/adapters.fa interleaved=f out=m.c-2/sequence_quality_control/m.c-2_filtered_R1.fastq.gz out2=m.c-2/sequence_quality_cont
rol/m.c-2_filtered_R2.fastq.gz outs=m.c-2/logs/m.c-2_quality_filtering_stats.txt stats=m.c-2/logs/m.c-2_quality_filtering_stats.txt overwrite=true qout=33 trd=t hdi
st=1 k=27 ktrim=r mink=8 trimq=10 qtrim=rl threads=8 minlength=51 maxns=-1 minbasefrequency=0.05 ecco=t prealloc=t pigz=t unpigz=t -Xmx51G                          
Executing jgi.BBDuk [in1=m.c-2/sequence_quality_control/m.c-2_deduplicated_R1.fastq.gz, in2=m.c-2/sequence_quality_control/m.c-2_deduplicated_R2.fastq.gz, ref=/proj
/snic2020-5-486/nobackup/SMS-23-6668-micegut/resources/adapters.fa, interleaved=f, out=m.c-2/sequence_quality_control/m.c-2_filtered_R1.fastq.gz, out2=m.c-2/sequenc
e_quality_control/m.c-2_filtered_R2.fastq.gz, outs=m.c-2/logs/m.c-2_quality_filtering_stats.txt, stats=m.c-2/logs/m.c-2_quality_filtering_stats.txt, overwrite=true,
 qout=33, trd=t, hdist=1, k=27, ktrim=r, mink=8, trimq=10, qtrim=rl, threads=8, minlength=51, maxns=-1, minbasefrequency=0.05, ecco=t, prealloc=t, pigz=t, unpigz=t,
 -Xmx51G]                                                                                                                                                           
Version 39.01                                                                                                                                                       
                                                                                                                                                                    
Set INTERLEAVED to false                                                                                                                                            
Set threads to 8                                                                                                                                                    
Initial size set to 506877058                                                                                                                                       
maskMiddle was disabled because useShortKmers=true                                                                                                                  
Exception in thread "main" java.lang.AssertionError: Duplicate file m.c-2/logs/m.c-2_quality_filtering_stats.txt was specified for multiple output streams.         
        at shared.Tools.testOutputFiles(Tools.java:133)                                                                                                             
        at jgi.BBDuk.<init>(BBDuk.java:919)                                                                                                                         
        at jgi.BBDuk.main(BBDuk.java:78)


Atlas version
2.15.0

Additional context
It seems that the stats file (specified as stats="{sample}/logs/{sample}_quality_filtering_stats.txt", in the qc.smk rulesfile) is used both for outs= and stats=. My guess is that this comes from

            outputs=(
                lambda wc, output: "out={0} out2={1} outs={2}".format(*output)
                if PAIRED_END
                else "out={0}".format(*output)
            ),

starting on line 372 in workflow/rules/qc.smk.

Ok I found the error. I think it was due to me having the Reads_QC_R1 and Reads_QC_R2 columns in my sample file, but with empty values for my samples. Removing those columns seems to have fixed the issue.