epi2me-labs/wf-single-cell

Very small output BAM file (wrong BAM file processed by tag_bams?)

porchard opened this issue · 3 comments

Operating System

Other Linux (please specify below)

Other Linux

Red Hat Enterprise Linux 8.6

Workflow Version

v1.1.0

Workflow Execution

Command line

EPI2ME Version

No response

CLI command run

nextflow run -resume --fastq /scratch/scjp_root/scjp0/porchard/2024-03-ONT/data/fastq/9266-VD-1 --kit_name multiome --kit_version v1 --expected_cells 7000 --sample 9266-VD-1 --ref_genome_dir /scratch/scjp_root/scjp0/porchard/2024-03-ONT/data/ref-data -profile singularity /scratch/scjp_root/scjp0/porchard/2024-03-ONT/pipelines/wf-single-cell-41eceab/main.nf

Workflow Execution - CLI Execution Profile

singularity

What happened?

The workflow ran successfully, but the output BAM files contains only a very small number of reads (e.g., chr21 bam file {sample}.chr21.tagged.bam has 12,104 reads). I previously ran version v1.0.3 of the workflow successfully, which had produced more complete BAM files (e.g., chr21 bam file had 800,782 reads), but noticed the issue described in #78 and wished to re-run the updated version of the pipeline.

I've had a quick look at the pipeline and believe I know the source of the issue. In v1.1.0, process align:align_to_ref is run in a parallel fashion, processing each chunked fastq individually (in my case, 64 chunks, producing 64 bam files). In process_bams.nf, these bams are merged into a single bam by the combine_bams_and_tags process, but the tag_bams process later in process_bams.nf takes the original channel of 64 bam files. Should tag_bams be accepting the combine_bams_and_tags.out.merged_bam channel instead?

Relevant log output

executor >  slurm (386)
[9e/540ec6] process > fastcat (1)                    [100%] 1 of 1, cached: 1 ✔
[88/0fcc68] process > parse_kit_metadata (1)         [100%] 1 of 1 ✔
[5b/2f6671] process > pipeline:getVersions           [100%] 1 of 1, cached: 1 ✔
[ea/16fdd7] process > pipeline:getParams             [100%] 1 of 1, cached: 1 ✔
[5d/89b260] process > pipeline:chunkReads (1)        [100%] 1 of 1, cached: 1 ✔
[24/b0edc1] process > pipeline:stranding:call_ada... [100%] 64 of 64, cached:...
[9f/88af8d] process > pipeline:stranding:combine_... [100%] 1 of 1, cached: 1 ✔
[ac/c7ab0a] process > pipeline:stranding:summariz... [100%] 1 of 1, cached: 1 ✔
[7c/1679fb] process > pipeline:stranding:extract_... [100%] 64 of 64, cached:...
[bb/c0bb46] process > pipeline:align:call_paftools   [100%] 1 of 1, cached: 1 ✔
[32/6a3ce9] process > pipeline:align:get_chrom_sizes [100%] 1 of 1, cached: 1 ✔
[28/2c8f9f] process > pipeline:align:build_minima... [100%] 1 of 1, cached: 1 ✔
[35/7b95a5] process > pipeline:align:align_to_ref... [100%] 64 of 64, cached:...
[93/5ae363] process > pipeline:mergeTags (64)        [100%] 64 of 64, cached:...
[a6/79c4ff] process > pipeline:process_bams:split... [100%] 1 of 1, cached: 1 ✔
[f0/748c03] process > pipeline:process_bams:get_c... [100%] 64 of 64, cached:...
[87/70cc75] process > pipeline:process_bams:gener... [100%] 1 of 1, cached: 1 ✔
[ec/91ebe8] process > pipeline:process_bams:assig... [100%] 64 of 64, cached:...
[6f/e5c13b] process > pipeline:process_bams:combi... [100%] 1 of 1 ✔
[2c/a92a25] process > pipeline:process_bams:strin... [100%] 117 of 117 ✔
[7e/5d7130] process > pipeline:process_bams:align... [100%] 117 of 117 ✔
[f1/ce90ae] process > pipeline:process_bams:assig... [100%] 25 of 25 ✔
[dd/ee65f4] process > pipeline:process_bams:clust... [100%] 25 of 25 ✔
[05/8c6714] process > pipeline:process_bams:tag_b... [100%] 25 of 25 ✔
[6f/390216] process > pipeline:process_bams:combi... [100%] 1 of 1 ✔
[24/e41e1a] process > pipeline:process_bams:combi... [100%] 1 of 1 ✔
[f5/7abafd] process > pipeline:process_bams:umi_g... [100%] 1 of 1 ✔
[4c/78477c] process > pipeline:process_bams:const... [100%] 1 of 1 ✔
[4a/fd5023] process > pipeline:process_bams:proce... [100%] 1 of 1 ✔
[49/a2ffc8] process > pipeline:process_bams:pack_... [100%] 1 of 1 ✔
[41/9703ec] process > pipeline:process_bams:merge... [100%] 1 of 1 ✔
[5e/ae2c4a] process > pipeline:prepare_report_dat... [100%] 1 of 1 ✔
[b3/8298b2] process > pipeline:makeReport (1)        [100%] 1 of 1 ✔
[f8/97ecad] process > output (60)                    [100%] 65 of 65 ✔
[c9/e60723] process > output_report (1)              [100%] 1 of 1 ✔
Completed at: 19-Mar-2024 14:59:27
Duration    : 5h 35m 54s
CPU hours   : 276.0 (72.8% cached)
Succeeded   : 386
Cached      : 395

Application activity log entry

No response

Were you able to successfully run the latest version of the workflow with the demo data?

yes

Other demo data information

No response

Hi @porchard
I'm sorry that you're encountering an issue with the workflow, and thanks for bringing it to our attention.
You are right on the cause and how to remedy this. I'll get a fix in place for this ASAP.

I ran into this in my output. I applied porchard's one line fix and the BAMs are now closer in size to the pipeline run that completed in January.

Thanks @ktpolanski for confirming that change sorted things for you. I can confirm we have folded the change into our development branch.