pipeline terminating early
nick-youngblut opened this issue · 2 comments
Description of the bug
I just want to use the pipeline for QC'ing my nanopore data, but it prematurely terminates after the initial step of the pipeline:
[70/93224f] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SampleSheet.csv) [100%] 1 of 1 ✔
[- ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_RENAME -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC
-[nf-core/nanoseq] Pipeline completed successfully-
Command used and terminal output
nextflow run main.nf \
--input SampleSheet.csv \
--outdir path/to/output/ \
--protocol cDNA \
--skip_demultiplexing \
--skip_vc \
--skip_sv \
--skip_alignment \
--skip_differential_analysis \
--skip_quantification \
--skip_modification_analysis \
--skip_fusion_analysis \
-profile docker
Relevant files
My SampleSheet.csv file:
group,replicate,barcode,input_file,fasta,gtf
sample1,1,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample1,2,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,
sample2,1,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample2,2,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,
System information
- Server: Ubuntu 22.04.3 LTS
- Docker: 24.0.6
- Nextflow: 23.10.1.5891
It appears that the issue is due to --skip_demultiplexing
. A simple reprex:
nextflow run main.nf --outdir /home/nickyoungblut/projects/SspArc0008_10x_cDNA_longRead/data/SspArc0008_10x_cDNA_longRead/nanoseq_TEST/ --protocol cDNA --skip_demultiplexing -profile docker,test
[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv) [100%] 1 of 1 ✔
executor > local (1)
[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv) [100%] 1 of 1 ✔
[- ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GET_CHROM_SIZES -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GTF2BED -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_INDEX -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_ALIGN -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_VIEW_BAM -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_SORT -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_INDEX -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS -
[- ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC -
-[nf-core/nanoseq] Pipeline completed successfully-
The QC steps (e.g., NanoPlot) appear to be directly associated with the demultiplexing section of the pipeline, instead of applied to all downstream demux'd files (user provided demux'd files, or files demux'd by the pipeline):
if (!params.skip_demultiplexing) {
/*
* MODULE: Demultipexing using qcat
*/
QCAT ( ch_input_path )
ch_fastq = Channel.empty()
QCAT.out.fastq
.flatten()
.map { it -> [ it, it.baseName.substring(0,it.baseName.lastIndexOf('.'))] }
.join(ch_sample, by: 1) // join on barcode
.map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] }
.set { ch_fastq }
ch_software_versions = ch_software_versions.mix(QCAT.out.versions.ifEmpty(null))
} else {
if (!params.skip_alignment) {
ch_sample
.map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }
.set { ch_fastq }
} else {
ch_fastq = Channel.empty()
}
}
If params.skip_demultiplexing
or params.skip_alignment
(or NOT it[6].toString().endsWith('.gz')
), then ch_fastq = Channel.empty()
, and so no fastq files to process future in the pipeline.
It would greatly help to have the columns associated with the index values in:
.map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] }
and:
.map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }
Changing ch_fastq = Channel.empty()
to ch_sample.map { it -> [ it[0], it[6] ] }.set { ch_fastq }
enables the completion of NANOPLOT
and FASTQC
.
Still, the multi-qc report is not generated, which seems to be due to an unmet dependency at:
MULTIQC (
ch_multiqc_config,
ch_multiqc_custom_config.collect().ifEmpty([]),
ch_fastqc_multiqc.ifEmpty([]),
ch_samtools_multiqc.collect().ifEmpty([]),
ch_featurecounts_gene_multiqc.ifEmpty([]),
ch_featurecounts_transcript_multiqc.ifEmpty([]),
CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
)
With my edits (above), ch_fastqc_multiqc
is not empty, so I would think that MULTIQC
would run.
The following edit works:
MULTIQC (
ch_multiqc_config,
ch_multiqc_custom_config.collect().ifEmpty([]),
ch_fastqc_multiqc.collect().ifEmpty([])//,
//ch_samtools_multiqc.collect().ifEmpty([]),
//ch_featurecounts_gene_multiqc.ifEmpty([]),
//ch_featurecounts_transcript_multiqc.ifEmpty([]),
//CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
//ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
)
Note: I updated
process MULTIQC
accordingly.
Also note: I had to include
collect()
toch_fastqc_multiqc