pipeline terminating early

Question

pipeline terminating early

nick-youngblut opened this issue 10 months ago · 2 comments

Description of the bug

I just want to use the pipeline for QC'ing my nanopore data, but it prematurely terminates after the initial step of the pipeline:

[70/93224f] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SampleSheet.csv) [100%] 1 of 1 ✔
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC                  -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_RENAME                                      -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS                     -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC
-[nf-core/nanoseq] Pipeline completed successfully-

Command used and terminal output

nextflow run main.nf \
  --input SampleSheet.csv \
  --outdir path/to/output/ \
  --protocol cDNA \
  --skip_demultiplexing \
  --skip_vc \
  --skip_sv \
  --skip_alignment \
  --skip_differential_analysis \
  --skip_quantification \
  --skip_modification_analysis \
  --skip_fusion_analysis \
  -profile docker

Relevant files

My SampleSheet.csv file:

group,replicate,barcode,input_file,fasta,gtf
sample1,1,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample1,2,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,
sample2,1,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample2,2,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,

System information

Server: Ubuntu 22.04.3 LTS
Docker: 24.0.6
Nextflow: 23.10.1.5891

Answer 1 · 2024-03-13T19:53:32.000Z

It appears that the issue is due to --skip_demultiplexing. A simple reprex:

nextflow run main.nf   --outdir /home/nickyoungblut/projects/SspArc0008_10x_cDNA_longRead/data/SspArc0008_10x_cDNA_longRead/nanoseq_TEST/   --protocol cDNA   --skip_demultiplexing   -profile docker,test

[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv)      [100%] 1 of 1 ✔
executor >  local (1)
[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv)      [100%] 1 of 1 ✔
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT                             -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC                               -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GET_CHROM_SIZES                               -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GTF2BED                                       -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_INDEX                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_ALIGN                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_VIEW_BAM                    -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_SORT                        -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_INDEX                       -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS    -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS                                  -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC                                                      -
-[nf-core/nanoseq] Pipeline completed successfully-

The QC steps (e.g., NanoPlot) appear to be directly associated with the demultiplexing section of the pipeline, instead of applied to all downstream demux'd files (user provided demux'd files, or files demux'd by the pipeline):

    if (!params.skip_demultiplexing) {

        /*
         * MODULE: Demultipexing using qcat
         */
        QCAT ( ch_input_path )
        ch_fastq = Channel.empty()
        QCAT.out.fastq
            .flatten()
            .map { it -> [ it, it.baseName.substring(0,it.baseName.lastIndexOf('.'))] }
            .join(ch_sample, by: 1) // join on barcode
            .map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] }
            .set { ch_fastq }
        ch_software_versions = ch_software_versions.mix(QCAT.out.versions.ifEmpty(null))
    } else {
        if (!params.skip_alignment) {
            ch_sample
                .map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }
                .set { ch_fastq }
        } else {
            ch_fastq = Channel.empty()
        }
    }

If params.skip_demultiplexing or params.skip_alignment (or NOT it[6].toString().endsWith('.gz')), then ch_fastq = Channel.empty(), and so no fastq files to process future in the pipeline.

It would greatly help to have the columns associated with the index values in:

.map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] }

and:

.map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }

Answer 2 · 2024-03-13T20:33:17.000Z

Changing ch_fastq = Channel.empty() to ch_sample.map { it -> [ it[0], it[6] ] }.set { ch_fastq } enables the completion of NANOPLOT and FASTQC.

Still, the multi-qc report is not generated, which seems to be due to an unmet dependency at:

        MULTIQC (
        ch_multiqc_config,
        ch_multiqc_custom_config.collect().ifEmpty([]),
        ch_fastqc_multiqc.ifEmpty([]),
        ch_samtools_multiqc.collect().ifEmpty([]),
        ch_featurecounts_gene_multiqc.ifEmpty([]),
        ch_featurecounts_transcript_multiqc.ifEmpty([]),
        CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
        ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
        )

With my edits (above), ch_fastqc_multiqc is not empty, so I would think that MULTIQC would run.

The following edit works:

        MULTIQC (
        ch_multiqc_config,
        ch_multiqc_custom_config.collect().ifEmpty([]),
        ch_fastqc_multiqc.collect().ifEmpty([])//,
        //ch_samtools_multiqc.collect().ifEmpty([]),
        //ch_featurecounts_gene_multiqc.ifEmpty([]),
        //ch_featurecounts_transcript_multiqc.ifEmpty([]),
        //CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
        //ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
        )

Note: I updated process MULTIQC accordingly.

Also note: I had to include collect() to ch_fastqc_multiqc